Date post: | 20-Aug-2019 |
Category: |
Documents |
Upload: | trinhquynh |
View: | 221 times |
Download: | 0 times |
BioData-SF Bioinformatics Data Standardization
Formatter
On-line updated information available at:
http://www.bitlab-es.com/biodatasf Alfredo Martínez
[email protected] Oswaldo Trelles [email protected]
BioData-SF. Getting Started
Index
1. INTRODUCTION
1.1 State of the art
1.2 What’s BioData-SF
2. INSTALLATION
2.1 Zipped version
2.2 Java web start version
3. USING THE DeskTopEditor INTERFACE
3.1 Repositories
3.2 Data types tree
3.3 File system
3.4 Collection edition
4. GUIDED EXERCISE
4.1 Data creation/edition
4.2 Data collection creation/edition
4.3 Automatic data recognition
5. BIODATA-SF FOR DEVELOPERS
6. REFERENCES
BioData-SF. Getting Started
1 Introduction
1.1 State of the art
Most of the clients that exploit web services provides tools to create data inputs or,
sometimes, to edit the service output. In this chapter we will explain the state of the art
in data creation and edition processes.
Remora (Carrere S. et al, 2005), is a web wizard for the creation of workflows using
MOBY services. Starting by a data type id or namespace, but it doesn’t allow the use of
collections neither create nor edit data collections.
Gbrowse Moby (Wilkinson M. 2006) was the first web client for BioMOBY
community, it’s browser-style interface enables the execution of web services but it is
restricted to basic object input.
MOWserv (Navas et al., 2006) exploits the metadata information stored in an extended
Moby-based repository to create a general object editing interface, including help to
populate the fields and types (e.g. field description tool tips). This automated generation
of the data editing interface enables support for new types as quickly as they are
registered in the repository. MOWServ does not allow automatic user input (data)
recognition.
Seahawk (Gordon P. et al, 2007) is an applet for the data driven execution of MOBY
web services, it recognizes in the text pasted or in the link the selected text as Strings or
Sequences and automatically converts to a MOBY object and launch a compatible web
service with that data type (only available for sequences data types). If the object is a
collection of sequences, Seahawk splits it into all the sequences it has, but it don’t
provide a way to edit or create objects/collections, also all the collections must be
sequences collections. BioData-SF uses the same recognition system as Seahawk in
order to automatically recognize user inputs.
Dashboard (BioMoby Dashboard, 2010) is a web service providers/developers-oriented
standalone client that for registration and testing of web services and with capabilities to
create and display MOBY-S objects. Dashboard exposes the underlying XML tags of
the Moby format for editing, but this is aimed at developers for service testing purposes.
1. 2 What’s BioData-SF
BioData-SF (Bioinformatics Data Standardization Formatter) is a set of Java libraries
aimed at user data editing and management. The library includes the generic data type,
data and namespace modules from mAPI (Ramirez et al., 2009), that provides user data
management, multiple data type’s and namespaces repositories support, data collections
management, automatic data interface generation etc.
BioData-SF. Getting Started
The mAPI data module has been expanded with the “heuristics” classes. Each heuristic
class allows automatic recognition of a specific data type/s. The heuristic system is
based on a plug-in system, which means that the library recognition capabilities can be
expanded easily by adding new recognition programs, such as ReadSeq (Gilbert, 2009).
BioData-SF. Getting Started
2 Installation
The stand alone client, desktop object editor can be founded on our web site
http://bitlab-es.com/biodatasf/
Also, you can find the library (DataModule, DataTypeModule, NamespaceModule and
ModularAPICore are necessary) on our web site too, into the “Modular API modules
download” section, http://bitlab-es.com/mapi/
First of all you might need to download the latest JDK version from
http://java.sun.com/javase/downloads/index.jsp
1
BioData-SF. Getting Started
2.1 Zipped version
For use the zipped version you need the next requisites:
• An application to unzip the file.
• Java 6 or greater, as we have described previously.
To use this version click on the “BioData-SF (Zipped version)” link. You only need to
unzip the file and run BioDataSF.bat file for debug execution, or BioDataSF.jar for
standard execution.
2.2 Java web start version
For use this version you need the next requisites:
• Java 6 or greater, as we have described previously.
Simply click on the “BioData-SF (Java Web Start)” link on our web page and it will be
installed automatically.
BioData-SF. Getting Started
3 Using the DeskTopEditor interface
The DeskTopEditor is a stand alone client built using the BioData-SF library. In the
following sections we will make a quick use overview.
3.1 Repositories
The editor is capable to manage multiple data types and namespaces repositories.
To switch between those repositories just select one from the “Repository” menu.
Remember to save all the edited data before changing repository.
3.2 Data Types tree
On the screen left you can find the current repository data type list, shown as a tree.
Double click on any data type will open an edition tab (on middle screen) in order to
create a new data from the selected data type.
3.3 File System
On the screen bottom you can find also the file system; it can be used to navigate
through your local file system. Double click on a file will open a new edition tab in
order to edit the data within. Right click will show the file menu:
Standardize, this option is shown only when the selected file can’t be
recognized using the default data format (by default BioMoby format).
Selecting this option will launch the standardization wizard (see next
section).
Change format, this option allows automatic transformation between the
file source format and the selected format. A list of data’s data type
compatible formats will be shown. Note that using that changing data
format may end on information lost.
Save as, this option allows data’s data type casting; using this option any
data can be saved in any data types parent. A list of data type’s parents
will be shown. Note that data type’s casting may end on information lost.
View, shows a list of compatible data viewers. The default viewer is the
O.S. default viewer (for example Internet Explorer for Windows O.S.)
3.4 Collection edition
The collection edition tab allows you to create your own data collections.
To add new data/s to the collection just click on the add button on the data list bottom.
To delete one or more data from the collection just click on the delete button (at the add
button left).
BioData-SF. Getting Started
4 Guided exercise
The following exercise is aimed for training purposes on the main system capabilities. It
is not intended for training about the biological or clinical interpretation of the results.
The data files used into the exercises can be found into the product home page:
http://bitlab-es.com/biodatasf/
4.1 Data creation/edition
Creating a new data
Suppose that we have an amino acid sequence in a text file and we are interested, for
example, on launching a service with it:
1. First of all we need to search the data type that fit with our data. Double
click on any data type’s tree data type, in our example we will use the
AminoAcidSequence data type:
BioData-SF. Getting Started
2. A new editor tab will be created on the central panel. That tab shows the
selected data type interface and allows the data edition:
1
BioData-SF. Getting Started
3. Now we can start filling the data fields:
4. When the field is a String data type or a Complex data (Composed data),
it also has its own fields, the load from file button is avaiable. You can
fill those fields on the same way we have described or you can also load
the content from a file using the load form file button, please note that
the selected file must have a valid content for the field, for example if
the field data type is AminoAcidSequence, the file must contain an
AminoAcidSequence formatted object. To load the field from a existing
file click on the load from file button:
i. a file browser will be shown, select the file that contains the field
information and open it
ii. the field edition area will show now the file path and it will
change to yellow indicating that.
2
3
BioData-SF. Getting Started
5. Once you finished filling the data fields, the object can be saved. Change
the object name form the editor panel top and save (5.i) it:
6. Once the object has been saved, you can view the result file using the
file system panel. Right click on the result and go to View->Default, this
will use the default O.S. viewer for this file (Internet Explorer for
windows systems):
7. The default O.S. viewer will be opened showing the file content:
5
5.i
6
7
BioData-SF. Getting Started
Editing a data
We can also edit a previously saved data:
1. Navigate with the file system panel until you find the data you want to
edit then double click on it:
2. A new editor tab containing the data information will be created on the
central panel:
3. From now the data can be edited following the instructions of the
previous section creating a new data.
4.2 Data collection creation/edition
Creating a new data collection
Another BioData-SF main capability is the data collection management. Suppose that
we have a set of amino acid sequences objects previously stored in independent files:
1
2
BioData-SF. Getting Started
1. Select the Collection edition tab, just next to main tab:
2. To add new data/s file/s to the collection content list, click on the add
new object/s button:
i. a file browser will be shown, select one or more files/s to
add to the collection.
ii. the added data/s file/s will appear in the collection content.
1
2
BioData-SF. Getting Started
3. Once you have added all the intended data files to the collection you can
save it. Change the data collection name form the collection panel top
and click on save button (3.i):
2.i
2.i
2.i.i
BioData-SF. Getting Started
Editing a data collection
To edit a previously stored data collection:
1. Double click on any file system collection, note that the collection file
icon is different from the regular file icon:
2. A new collection edition tab containing the collection information will
be created on the central panel:
3
3.i
1
1
BioData-SF. Getting Started
3. To edit the data collection simply add/remove the elements following
the instructions of the previous section, creating a new data collection.
4.2 Automatic data recognition
One of the main BioData-SF capabilities is the automatic data recognition. Suppose that
we have a Genbank formatted file and we want to create a GenBank_text object from it.
First of all we need to use the file system area to move across the default directory, you
will find multiple files that contain data in text-plain format.
1. Right click on the file and select the “Standardize” Submenu.
2. Wait to the heuristic to finalize standardization process; this will not block the
editor interface so you still can work with the application.
1
2
2
1
BioData-SF. Getting Started
3. Navigate thought the heuristic results and select the result/s you want, for
example we need the GenBank_Text object and we need also the DNA sequence
contained on it as an independent object.
4. Select each result/s target format. This format indicates the target file format for
each selected result.
3
4
BioData-SF. Getting Started
5. Select the result/s prefix (5.i) (optional), all selected results will use this prefix to
compose the target file name. You can also select the target directory (5.ii)
where the result/s will be saved (optional), if no directory is selected, the result/s
will be saved into the current directory.
6. Go into the created directory “Results” (6) to find the standardization process
result/s, note that all results will have “.heu” extension.
7. Right click on a result and select view->default.
5.i
5.ii
6
7
BioData-SF. Getting Started
8. This will use the default O.S viewer for the result. (For example Internet
explorer for windows O.S)
8
8
BioData-SF. Getting Started
5 BioData-SF for developers To use the stand alone DeskTopEditor GUI in your application just follow the next
example:
//You will need a panel for each DekTopEditor feature you need to use
//A panel for the editor panel
JPanel editorp;
//and a panel for the collection editor
JPanel collectionp;
.
.
.
//The DeskTopEditor configuration file
String configFile = "conf/DeskTopEditorCache.conf";
//The “home” directory, the editor will use this as default directory
String home = "examples";
//The default file format, it will be used to manage user data
String format = “Moby”;
//Create the structure that manages repositories, formats etc..
Sharedinfo sh = new Sharedinfo(configFile, home, format);.
.
.
.
//Create the editor panel
EditorPanel editor = new EditorPanel();
Data edited_data;
//To create a new data from an existing data type
DataType dt = Sharedinfo.getDataModule().getDataTypeModule().
searchDataType("Sequence").get(0);
//Create the data panel
editor.createPanel(dt);
//Get the editor panel
editorp = editor.getEditorPanel();
.
.
.
//Or to edit an existing user data
String file_name = "RODC.xml";
//Load the data within the file
edited_data = Sharedinfo.getDataModule().newData(new
FileReader("/examples/".concat(file_name)),
Sharedinfo.getDefaultFormat());
edited_data = editor.createPanel(edited_data, file_name);
editorp = editor.getEditorPanel();
.
.
.
//Create the collection panel
CollectionPanel collection = new CollectionPanel();
//Get the collection editor panel
collectionp = collection.createPanel();
BioData-SF. Getting Started
file_name = "Collection_ODC.xml";
//Load the data collection within the file
edited_data = Sharedinfo.getDataModule().newData(new
FileReader("/examples/".concat(file_name)),
Sharedinfo.getDefaultFormat());
collection.createPanel(edited_data, file_name);
collectionp = collection.getCollectionPanel()
.
.
.
BioData-SF. Getting Started
6 References
1 BioMoby Dashboard. http://biomoby.open-bio.org/CVS_CONTENT/moby-
live/Java/docs/Dashboard.html.
2 Gilbert D. (2009). Sequence file format conversion with command-line readseq.
Current Protocols in Bioinformatics. Appendix 1E.
3 Gordon P.M.K., Sensen C.W. (2007) Seahawk: Moving Beyond HTML in Web-
based Bioinformatics Analysis. BMC Bioinformatics 8:208.
4 Navas, R. et al. (2006). Intelligent client for integrating bioinformatics services.
Bioinformatics 22, 106–11.
5 Wilkinson M.D. (2006) Gbrowse moby: a web-based browser for BioMOBY
services. Source Code for Biology and Medicine 1:4.