+ All Categories
Home > Documents > 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0....

0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0....

Date post: 24-May-2018
Category:
Upload: dinhnga
View: 232 times
Download: 1 times
Share this document with a friend
18
Validator Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is a framework that allows one to validate a data against a set of rules. These rules can defines how controlled vocabularies and ontologies are used, but also, arbitrary rules that are defined and implemented by the developer a specific instance of a validator. A. Bird's Eye View B. Technologies and Requirements The validator framework was written in Java and uses Maven 2 as build system. The configuration of the framework is mostly done using XML files. Should you wish to write your own validator, the following requirements apply: Java 5 and higher Maven 2 and higher (if you wish to take advantage of existing infrastructure) A data model written in Java (this is the data you are going to validate) you can also use our sample data model to try the validator out. C. Validator's Components The validator is built in a component oriented manner, here is a short decriptions of the major ones: Controlled vocabularies and Ontologies access: this module is meant to give a unified access to Controlled vocabularies and Ontologies (whether they are available locally or remotely) via a simple API. The Controlled Vocabulary Mapping Rules are definition of Controlled vocabularies and Ontologies usage in a specific data model. By mean of XPath like expressions, one can define what ontology terms are allowed in a specific location of a data model. The User Defined Rules are defined and implemented by the Validator's developer when Mapping rules do not allow to perform the desired validation. These rules do have access to the controlled vocabularies and ontologies and their complexity can potentially be much higher as YOU are coding them. Below is a simple comparison of the 2 kinds of rules a validator can be build upon:
Transcript
Page 1: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 1 / 18

0. Validator Tutorial: Introduction Overview of the architecture

The PSI Validator is a framework that allows one to validate a data against a set of rules. These rules

can defines how controlled vocabularies and ontologies are used, but also, arbitrary rules that are

defined and implemented by the developer a specific instance of a validator.

A. Bird's Eye View

B. Technologies and Requirements

The validator framework was written in Java and uses Maven 2 as build system. The configuration of

the framework is mostly done using XML files. Should you wish to write your own validator, the

following requirements apply:

Java 5 and higher

Maven 2 and higher (if you wish to take advantage of existing infrastructure)

A data model written in Java (this is the data you are going to validate) you can also use

our sample data model to try the validator out.

C. Validator's Components

The validator is built in a component oriented manner, here is a short decriptions of the major ones:

Controlled vocabularies and Ontologies access: this module is meant to give a unified access to

Controlled vocabularies and Ontologies (whether they are available locally or remotely) via a

simple API.

The Controlled Vocabulary Mapping Rules are definition of Controlled vocabularies and

Ontologies usage in a specific data model. By mean of XPath like expressions, one can define

what ontology terms are allowed in a specific location of a data model.

The User Defined Rules are defined and implemented by the Validator's developer when Mapping

rules do not allow to perform the desired validation. These rules do have access to the controlled

vocabularies and ontologies and their complexity can potentially be much higher as YOU are

coding them.

Below is a simple comparison of the 2 kinds of rules a validator can be build upon:

Page 2: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 2 / 18

D. Flow of a Validation

Once you have a data model and a validator consisting of a set of rules, you can run your first

validation. Here we define step by step how this is done:

1. Data model is submitted to the validator. Usually one would rather submit specific objects to

validate than the whole data model at once. However, every data model is different and the

granularity of the objects defined in this model would vary accordingly, consequently, you have

to defined for yourself what is going to be your unit of work (eg. a car in a car factory, a

molecular interaction in a proteomics experiment, ...).

2. If any CV mapping rules have been defined, the validator is going to run an internal validations

on them and potentially remove all those that are not valid. The model provided is then run on

the remaining rules. Messages are returned should a validation exception occur, this message

include a description of the issue, a level of severity (values in DEBUG, INFO, WARN, ERROR,

FATAL).

3. If any user defined rules have been defined, the validator is running each of them on the data

model and here again, messages can be generated upon exceptions.

4. All messages are returned to the user that is then free to process them.

This tutorial will now take you through the steps required to build a Validator and is organised as

follow:

1. How to write your own validator ?

2. Getting access to the needed Ontologies and Controlled vocabularies

3. Building rules to map ontologies and controlled vocabulary terms to your domain model

4. Building your own rules

5. Wiring it together: build your validator and run it on sample data

6. Download Validator's tutorial source code

E. Contact

Should you have any further questions about the Validator, please send an email to skerrien [at] ebi

[dot] ac [dot] uk

Page 3: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 3 / 18

1. Validator Tutorial: How to Write Your Own Validator

In this section, we are going to give more information about what you should if you are planning to

write your own validator.

a. Requirements

Java 5 or above (http://www.oracle.com/technetwork/java/index.html)

Maven 2 or above (http://maven.apache.org/) This is not per se a mandatory requirements but as

we have developed the framework using it that are many advantages to be gained. Should you

choose not to use it, please be aware that we have made available a version of the validator

framework on SourceForge that contains all necessary dependencies.

A Java IDE to ease the development (in this tutorial I will mostly refer to IntelliJ 13.x --

http://www.jetbrains.com/idea)

b. Defining your needs

Here are a few question you could ask yourself before to go any further:

What is to be checked on ?

What part of my data model ?

Am I using ontologies and controlled vocabularies ?

What ontologies and controlled vocabularies is my model using ?

Are these ontologies available in OBO format ?

Are these available in the Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup) ? On the

Internet ? On your local computer ?

Anything else you need to check on ?

How would I proceed to validate it, how can I implement it ?

In the following sections we are going to define more precisely how to use the various components of

the validator.

Page 4: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 4 / 18

2. Validator Tutorial: Getting access to the needed Ontologies and Controlled vocabularies

In this section, we are going to see in more details how one can deal with Ontologies and Controlled

vocabularies in the validator framework. To start with, you would have to define what ontologies will be

required to validate your data model. It could be any one available in the Ontology Lookup Service

(http://www.ebi.ac.uk/ontology-lookup/ontologyList.do), any data in OBO format available on the

network or locally.

A. configuration file

- Representation of the schema, as well as the location of the XSD

XSD available at: http://www.psidev.info/files/validator/CvSourceList.xsd

1. Description of the attributes

source - Physical source of the CV file or term information. The keywords 'OLS' or 'file' should be used

in this attribute and coupled with the appropriate URI. A fully qualified class name is also allowed when

it implements the ontology loader interface (ie.

psidev.psi.tools.ontology_manager.interfaces.OntologyAccess) and has a public default constructor.

name - Name of the CV as in the PSI CV resource.

identifier - Internal identifier for the CV source to be cross-referenced in the CVTerm instances.

uri - Universal identifier of the CV resource.

format - To describe the CvFomart use consistently the upper case of the acronymes of the CV

language, e.g. 'OBO', 'OWL', or the 'plain text' keyword when applicable.

version - Version of the OBO format used.

2. Sample file

Page 5: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 5 / 18

You can download this sample file here: ontologies.xml

B. Different types of access

The framework currently allows several ways to access a controlled vocabulary or ontology resource.

We are going to describe below some of the facilities provided:

source={OLS, FILE, user-defined-class}

1. File

This is essentially any obo file that can be found via a URL (http, ftp, file...) or in the classpath or the

running application.

a. Using a local file

Local file can be accessed by defining a URL that uses the file protocol, here is an example:

b. Using a URL

Here is an example of access using the HTTP protocol:

c. Using your classpath

If you have made available an OBO file in your classpath, you can access it by prefixing the URI with

classpath:, here is an example

2. Ontology Lookup Service

As of May 23rd 2008, OLS has integrated 61 ontologies and 720,114 terms amongst which one can

access GO, PSI-MI, PSI-MS, PSI-MOD... The Ontology Manage module is provided with a

implementation that uses OLS to access ontologies and controlled vocabularies.

Please note that when using OLS, the URI of the source is not mandatory as OLS is relying on the

source's identifier to access the data. A complete list of all supported identifier can be found on

the OLS web site

3. Writing your own implementation of OntologyAccess

Currently, only the OBO format is supported. Should one of the ontology or controlled vocabularies you

use not been supported you can extend the functionality of the Ontology Manager.

You can write your own class that implements

psidev.psi.tools.ontology_manager.interfaces.OntologyAccess.

Page 6: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 6 / 18

Now let's say you have implemented an OWL access in the following class:

com.company.ontology.OwlAccess

you can then declare a new CvSource using is as follow:

Obviously, the compiled class OwlAccess would have to be in the classpath when running the validator.

Page 7: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 7 / 18

3. Validator Tutorial: How to Build CV Mapping Rules?

In this section we are going to see how one can simply define a direct mapping between a data model

and a set of ontologies/controlled vocabularies.

A. Defining how the model is supposed to relate to the ontologies

This is a crucial step in the design of your mapping rules as you are going to define which part of the

data model is going to map to which specific part of the ontologies or controlled vocabularies.

B. Formalizing this binding in a configuration file

b1. Format of the configuration file

XSD available here.

Definition of the attributes of each elements:

CvMapping

modelName - Name of the PSI data exchange schema, e.g. mzML, GelML, MIF.

modelURI - URI of the data exchange schema.

modelVersion - Version number of the model supported by the CvMapping file.

CvReference

cvIdentifier - Short label for the CV or namespace, this should correspond to a cvIdentifier

attribute of CvTerm in the CvSourceList configuration file.

cvName - Full descriptive name for the CV.

CvMappingRule

id - Unique identifier for this rule in the scope of the current configuration file. Idenfiers are

alphanumerical.

name - A short name for this rule. This may be used for error reporting.

scopePath - Element scope in the schema within which the non repeatable (isRepeatable = FALSE)

condition applies.

cvElementPath - The full xpath expression that define the part of the data model we are

mapping.

cvTermsCombinationLogic - Boolean operator describing the combination logic of multiple

CvTerm elements associated with the same CvMappingRule.

requirementLevel - The requirement level indicated, when the XML element exists in the instance

data file, if the association with CV terms is optional (MAY), recommended (SHOULD) or

mandatory (MUST).

CvTerm

Page 8: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 8 / 18

cvIdentifierRef - Internal reference (e.g. namespace abbreviation) to a term source file as defined

in a CvReference element.

termAccession - CV term accession number as in the CV file.

termName - CV term name.

useTermName - Boolean to set whether the check is done on the termName (TRUE) or on the

termAccession (FALSE and default).

useTerm - This attribute indicates whether the term itself can be used to annotate data (TRUE) or

not (FALSE). This latter case may happen when a term, parent of valid terms for annotation, is

mentioned to keep the mapping concise.

allowChildren - This attribute indicates whether the children of the described term are allowed to

annotate data (TRUE) or not (FALSE).

isRepeatable - Value is 'True' when a term can be repeated in the same instance of the associated

XML element.

Sample configuration file

C. Example of rule definition

Now let's define a toy example on which we will be able to build a sample custom Validator:

In a nutshell, this model describe an experiment under which one can find one to many molecules.

Each molecule is characterized by a sequence (if applicable) and a MoleculeType (values taken from an

ontology we have defined in an OBO file: molecule-type.obo) and can have zero to many post

translational modifications (values taken from the PSI-MOD ontology).

Here is a graphical representation of the molecule type ontology:

Page 9: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 9 / 18

Now let's define some rules based on this data model and express them using the cv mapping.

rule 1: all molecules must have a type that is 'protein' or 'nucleic acid' or one of it's children term

rule 2: if a modification is defined on a molecule, it should be a child term of 'protein modification categorized by

amino acid modified' (MOD:01157)

Page 10: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 10 / 18

You can download the complete sample file here: cv-mapping.xml

Note: we have tried to develop this component so that it makes the developer's life a little easier when

it comes to write your XPath expression. The component automatically verifies that the XPath

expression is valid again the instance of the data model submitted and if not correct, a

ValidatorMessage will be generated in order to describe the issue, and if possible, provide a solution to

fix it. Let's take a look at an example:

We define on the above described model the following Xpath expression: /experiment/molecul/modifications/@id

When you run the validator's CV Mapping Rules on an instance of experiment that does have at least

one molecule, you would get the following error message: Could not find property 'molecul' of the xpath expression 'molecul/modifications/@id' (element position: 1) in the given object of: net.sf.psi.spe.Experiment - Did you mean 'molecules' ?

Page 11: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 11 / 18

4. Validator Tutorial: How to Build Your Own Rules?

A. What can these user-defined rules do for you ?

Essentially, whenever the CV mapping rules cannot be used to model the validation you want to apply,

the Object Rules are the alternative. There is inherently no limitation to what these rules can do, as

long as you are able to program them using the Java langage and the plethora of libraries available on

the internet.

B. Implementing your first rule

The validator API defines a class that one has to extend in order to write a rule: psidev.psi.tools.validator.rules.codedrule.ObjectRule

The class diagram below illustrate this part of the Validator's data model:

As you can see on this diagram, in order to fulfill the contract of an ObjectRule, you will have to

implement the following methods: boolean canCheck( Object object ); Collection<ValidatorMessage> check( Object object )

The canCheck method allows to define what object type (ie. class) a specific rule is able to validate. The

second method 'check' is the one that performs the validation and returns messages if inconsistencies

are detected.

1. Writing a simple rule

So let's define a first very simple rule that only accesses the data available in the provided instance of

the data model. In this example we are still playing with our Simple Proteomics Experiment of which

the class diagram is available here.

Page 12: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 12 / 18

In this first simple rule, we are going to to look into the Experiment and report an error whenever no

name has been given.

If you wish to run this rule yourself, you can download the source code of this sample validator here.

2. Writing a rule that does use Ontologies

Now let's write a rule that reports the following inconsistencies :

If the molecule type is protein (SPE:0326), then if the sequence is defined it has to be composed

of amino acid only.

If the molecule type is nucleic acid or one of it's children term, then if the sequence is defined it

has to be composed of nucleic acid only.

If the molecule type is ribonucleic acid or one of it's children term, then if the sequence is defined

it has to be composed of ribonucleic acid only.

If the molecule doesn't have a sequence (unless it is a small molecule), we report a low severity

(INFO) message.

Here is the rule implementing these constraints:

Page 13: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 13 / 18

Please note that in order to keep this code sample consise, we have removed the import section. Please

download the full source code if you want to get the complete version.

B. Configuring Your Set of Object Rules

1. The Object Rules Schema

Page 14: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 14 / 18

2. Example of rule set for the two rules defined above

Page 15: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 15 / 18

5. Validator Tutorial: Wiring It Together - Bringing All Components Together

Not that you have created your CV Mapping rules and/or your own object rules, the next logical step is

to create your own validator.

Here is a graphical representation of the process of building a validator given the separate

components:

As you can see in the above representation, in order to build your own validator, you will have to bring

together your configuration files in order to define ontologies, cv mapping rules, and object rules (for

which you also have to provide your rules). Once you have brought all of this together inside a project,

you can create your own validator as follow :

Page 16: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 16 / 18

In this code example, one can see that two methods have been written:

The constructor of the SPE Validator that essentially passes the 3 configuration files to the

generic validator,

The validate method that takes an Experiment and run the cv mapping validation as well as the

object rule validation. Any message generated in this process is stored into a collection and

returned to the calling process.

Now that we have put everything together, it's time to run our validator on some data and display the

result of this validation. Obviously, the aim of this tutorial is not to give a lecture on user interface or

even how to write them in Java so we are going to aim at a simple, basic user interface that allows to

print the result of our validation on the command line.

Page 17: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 17 / 18

Here is what our little program output: Validation run collected 3 message(s): ValidatorMessage{message='The result found at: /molecules/modifications/@id for which the values are ''BLA:0000X'' didn't match any of the 1 specified CV term: - MOD:01157 (protein modification categorized by amino acid modified) or any of its children. The term can be repeated. The matching value has to be the identifier of the term, not its name.', level=WARN, context=Context(/molecules/modifications/@id ), rule=} ValidatorMessage{message='The result found at: /molecules/type/@id for which the values are ''SPE:0328'' didn't match any of the 2 specified CV terms: - The sole term SPE:0326 (protein) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name. - SPE:0318 (nucleic acid) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.', level=ERROR, context=Context(/molecules/type/@id ), rule=} ValidatorMessage{message='Experiment id:3 doesn't have a name.', level=WARN, context=null, rule=null}

Page 18: 0. Validator Tutorial: Introduction - HUPO-PSI … – Documentation 28. Apr. 2014 Page 1 / 18 0. Validator Tutorial: Introduction Overview of the architecture The PSI Validator is

Validator – Documentation 28. Apr. 2014

Page 18 / 18

6. Validator Tutorial: Download Validator's Tutorial Source Code

Here are a few things you can download to get you started with the Validator:

The latest Validator framework can be downloaded from here.

The Simple Proteomics Experiment (SPE) sample project can be downloaded from here

This archive contains 2 projects: the SPE data model and the SPE simple validator.


Recommended