Bioinformatics Data Standardization Formatter -...

BioData-SF Bioinformatics Data Standardization

Formatter

On-line updated information available at:

http://www.bitlab-es.com/biodatasf Alfredo Martínez

[email protected] Oswaldo Trelles [email protected]

mailto:[email protected]

mailto:[email protected]

BioData-SF. Getting Started

Index

1. INTRODUCTION

1.1 State of the art

1.2 What’s BioData-SF

2. INSTALLATION

2.1 Zipped version

2.2 Java web start version

3. USING THE DeskTopEditor INTERFACE

3.1 Repositories

3.2 Data types tree

3.3 File system

3.4 Collection edition

4. GUIDED EXERCISE

4.1 Data creation/edition

4.2 Data collection creation/edition

4.3 Automatic data recognition

5. BIODATA-SF FOR DEVELOPERS

6. REFERENCES


1 Introduction

1.1 State of the art

Most of the clients that exploit web services provides tools to create data inputs or,

sometimes, to edit the service output. In this chapter we will explain the state of the art

in data creation and edition processes.

Remora (Carrere S. et al, 2005), is a web wizard for the creation of workflows using

MOBY services. Starting by a data type id or namespace, but it doesn’t allow the use of

collections neither create nor edit data collections.

Gbrowse Moby (Wilkinson M. 2006) was the first web client for BioMOBY

community, it’s browser-style interface enables the execution of web services but it is

restricted to basic object input.

MOWserv (Navas et al., 2006) exploits the metadata information stored in an extended

Moby-based repository to create a general object editing interface, including help to

populate the fields and types (e.g. field description tool tips). This automated generation

of the data editing interface enables support for new types as quickly as they are

registered in the repository. MOWServ does not allow automatic user input (data)

recognition.

Seahawk (Gordon P. et al, 2007) is an applet for the data driven execution of MOBY

web services, it recognizes in the text pasted or in the link the selected text as Strings or

Sequences and automatically converts to a MOBY object and launch a compatible web

service with that data type (only available for sequences data types). If the object is a

collection of sequences, Seahawk splits it into all the sequences it has, but it don’t

provide a way to edit or create objects/collections, also all the collections must be

sequences collections. BioData-SF uses the same recognition system as Seahawk in

order to automatically recognize user inputs.

Dashboard (BioMoby Dashboard, 2010) is a web service providers/developers-oriented

standalone client that for registration and testing of web services and with capabilities to

create and display MOBY-S objects. Dashboard exposes the underlying XML tags of

the Moby format for editing, but this is aimed at developers for service testing purposes.

1. 2 What’s BioData-SF

BioData-SF (Bioinformatics Data Standardization Formatter) is a set of Java libraries

aimed at user data editing and management. The library includes the generic data type,

data and namespace modules from mAPI (Ramirez et al., 2009), that provides user data

management, multiple data type’s and namespaces repositories support, data collections

management, automatic data interface generation etc.


The mAPI data module has been expanded with the “heuristics” classes. Each heuristic

class allows automatic recognition of a specific data type/s. The heuristic system is

based on a plug-in system, which means that the library recognition capabilities can be

expanded easily by adding new recognition programs, such as ReadSeq (Gilbert, 2009).


2 Installation

The stand alone client, desktop object editor can be founded on our web site

http://bitlab-es.com/biodatasf/

Also, you can find the library (DataModule, DataTypeModule, NamespaceModule and

ModularAPICore are necessary) on our web site too, into the “Modular API modules

download” section, http://bitlab-es.com/mapi/

First of all you might need to download the latest JDK version from

http://java.sun.com/javase/downloads/index.jsp

1


http://bitlab-es.com/mapi/

http://java.sun.com/javase/downloads/index.jsp


2.1 Zipped version

For use the zipped version you need the next requisites:

• An application to unzip the file.

• Java 6 or greater, as we have described previously.

To use this version click on the “BioData-SF (Zipped version)” link. You only need to

unzip the file and run BioDataSF.bat file for debug execution, or BioDataSF.jar for

standard execution.

2.2 Java web start version

For use this version you need the next requisites:

• Java 6 or greater, as we have described previously.

Simply click on the “BioData-SF (Java Web Start)” link on our web page and it will be

installed automatically.


3 Using the DeskTopEditor interface

The DeskTopEditor is a stand alone client built using the BioData-SF library. In the

following sections we will make a quick use overview.

3.1 Repositories

The editor is capable to manage multiple data types and namespaces repositories.

To switch between those repositories just select one from the “Repository” menu.

Remember to save all the edited data before changing repository.

3.2 Data Types tree

On the screen left you can find the current repository data type list, shown as a tree.

Double click on any data type will open an edition tab (on middle screen) in order to

create a new data from the selected data type.

3.3 File System

On the screen bottom you can find also the file system; it can be used to navigate

through your local file system. Double click on a file will open a new edition tab in

order to edit the data within. Right click will show the file menu:

Standardize, this option is shown only when the selected file can’t be

recognized using the default data format (by default BioMoby format).

Selecting this option will launch the standardization wizard (see next

section).

Change format, this option allows automatic transformation between the

file source format and the selected format. A list of data’s data type

compatible formats will be shown. Note that using that changing data

format may end on information lost.

Save as, this option allows data’s data type casting; using this option any

data can be saved in any data types parent. A list of data type’s parents

will be shown. Note that data type’s casting may end on information lost.

View, shows a list of compatible data viewers. The default viewer is the

O.S. default viewer (for example Internet Explorer for Windows O.S.)

3.4 Collection edition

The collection edition tab allows you to create your own data collections.

To add new data/s to the collection just click on the add button on the data list bottom.

To delete one or more data from the collection just click on the delete button (at the add

button left).


4 Guided exercise

The following exercise is aimed for training purposes on the main system capabilities. It

is not intended for training about the biological or clinical interpretation of the results.

The data files used into the exercises can be found into the product home page:


4.1 Data creation/edition

Creating a new data

Suppose that we have an amino acid sequence in a text file and we are interested, for

example, on launching a service with it:

1. First of all we need to search the data type that fit with our data. Double

click on any data type’s tree data type, in our example we will use the

AminoAcidSequence data type:



2. A new editor tab will be created on the central panel. That tab shows the

selected data type interface and allows the data edition:

1


3. Now we can start filling the data fields:

4. When the field is a String data type or a Complex data (Composed data),

it also has its own fields, the load from file button is avaiable. You can

fill those fields on the same way we have described or you can also load

the content from a file using the load form file button, please note that

the selected file must have a valid content for the field, for example if

the field data type is AminoAcidSequence, the file must contain an

AminoAcidSequence formatted object. To load the field from a existing

file click on the load from file button:

i. a file browser will be shown, select the file that contains the field

information and open it

ii. the field edition area will show now the file path and it will

change to yellow indicating that.

2

3


4

4.i

4.i

4.i.i


5. Once you finished filling the data fields, the object can be saved. Change

the object name form the editor panel top and save (5.i) it:

6. Once the object has been saved, you can view the result file using the

file system panel. Right click on the result and go to View->Default, this

will use the default O.S. viewer for this file (Internet Explorer for

windows systems):

7. The default O.S. viewer will be opened showing the file content:

5

5.i

6

7


Editing a data

We can also edit a previously saved data:

1. Navigate with the file system panel until you find the data you want to

edit then double click on it:

2. A new editor tab containing the data information will be created on the

central panel:

3. From now the data can be edited following the instructions of the

previous section creating a new data.

4.2 Data collection creation/edition

Creating a new data collection

Another BioData-SF main capability is the data collection management. Suppose that

we have a set of amino acid sequences objects previously stored in independent files:

1

2


1. Select the Collection edition tab, just next to main tab:

2. To add new data/s file/s to the collection content list, click on the add

new object/s button:

i. a file browser will be shown, select one or more files/s to

add to the collection.

ii. the added data/s file/s will appear in the collection content.

1

2


3. Once you have added all the intended data files to the collection you can

save it. Change the data collection name form the collection panel top

and click on save button (3.i):

2.i

2.i

2.i.i


Editing a data collection

To edit a previously stored data collection:

1. Double click on any file system collection, note that the collection file

icon is different from the regular file icon:

2. A new collection edition tab containing the collection information will

be created on the central panel:

3

3.i

1

1


3. To edit the data collection simply add/remove the elements following

the instructions of the previous section, creating a new data collection.

4.2 Automatic data recognition

One of the main BioData-SF capabilities is the automatic data recognition. Suppose that

we have a Genbank formatted file and we want to create a GenBank_text object from it.

First of all we need to use the file system area to move across the default directory, you

will find multiple files that contain data in text-plain format.

1. Right click on the file and select the “Standardize” Submenu.

2. Wait to the heuristic to finalize standardization process; this will not block the

editor interface so you still can work with the application.

1

2

2

1


3. Navigate thought the heuristic results and select the result/s you want, for

example we need the GenBank_Text object and we need also the DNA sequence

contained on it as an independent object.

4. Select each result/s target format. This format indicates the target file format for

each selected result.

3

4


5. Select the result/s prefix (5.i) (optional), all selected results will use this prefix to

compose the target file name. You can also select the target directory (5.ii)

where the result/s will be saved (optional), if no directory is selected, the result/s

will be saved into the current directory.

6. Go into the created directory “Results” (6) to find the standardization process

result/s, note that all results will have “.heu” extension.

7. Right click on a result and select view->default.

5.i

5.ii

6

7


8. This will use the default O.S viewer for the result. (For example Internet

explorer for windows O.S)

8

8


5 BioData-SF for developers To use the stand alone DeskTopEditor GUI in your application just follow the next

example:

//You will need a panel for each DekTopEditor feature you need to use

//A panel for the editor panel

JPanel editorp;

//and a panel for the collection editor

JPanel collectionp;

.

.

.

//The DeskTopEditor configuration file

String configFile = "conf/DeskTopEditorCache.conf";

//The “home” directory, the editor will use this as default directory

String home = "examples";

//The default file format, it will be used to manage user data

String format = “Moby”;

//Create the structure that manages repositories, formats etc..

Sharedinfo sh = new Sharedinfo(configFile, home, format);.

.

.

.

//Create the editor panel

EditorPanel editor = new EditorPanel();

Data edited_data;

//To create a new data from an existing data type

DataType dt = Sharedinfo.getDataModule().getDataTypeModule().

searchDataType("Sequence").get(0);

//Create the data panel

editor.createPanel(dt);

//Get the editor panel

editorp = editor.getEditorPanel();

.

.

.

//Or to edit an existing user data

String file_name = "RODC.xml";

//Load the data within the file

edited_data = Sharedinfo.getDataModule().newData(new

FileReader("/examples/".concat(file_name)),

Sharedinfo.getDefaultFormat());

edited_data = editor.createPanel(edited_data, file_name);

editorp = editor.getEditorPanel();

.

.

.

//Create the collection panel

CollectionPanel collection = new CollectionPanel();

//Get the collection editor panel

collectionp = collection.createPanel();


file_name = "Collection_ODC.xml";

//Load the data collection within the file

edited_data = Sharedinfo.getDataModule().newData(new

FileReader("/examples/".concat(file_name)),

Sharedinfo.getDefaultFormat());

collection.createPanel(edited_data, file_name);

collectionp = collection.getCollectionPanel()

.

.

.


6 References

1 BioMoby Dashboard. http://biomoby.open-bio.org/CVS_CONTENT/moby-

live/Java/docs/Dashboard.html.

2 Gilbert D. (2009). Sequence file format conversion with command-line readseq.

Current Protocols in Bioinformatics. Appendix 1E.

3 Gordon P.M.K., Sensen C.W. (2007) Seahawk: Moving Beyond HTML in Web-

based Bioinformatics Analysis. BMC Bioinformatics 8:208.

4 Navas, R. et al. (2006). Intelligent client for integrating bioinformatics services.

Bioinformatics 22, 106–11.

5 Wilkinson M.D. (2006) Gbrowse moby: a web-based browser for BioMOBY

services. Source Code for Biology and Medicine 1:4.

Date post:	20-Aug-2019
Category:	Documents
Upload:	trinhquynh
View:	221 times
Download:	0 times

Bioinformatics Data Standardization Formatter -...

Documents