PENN S TATE Compatible text, visual and mathematical representations for biological process...

Post on 28-Mar-2015

218 views 3 download

Tags:

transcript

PENNSTATE

Compatible text, visual and mathematical representations for biological process

ontologies

Nigam Shah

Penn State University

PENNSTATE Ontologies in Molecular Biology

• An ontology is a formal way of representing knowledge.– In an ontology, concepts are described both by

their meaning and their relationship to each other.* • Gene Ontology• 43 open ontologies under OBO

– First name ‘things’ … then name ‘relations’.

• If we specify the ‘logic’ of combining ‘things’ and ‘relations’ we can write hypotheses about biological processes in a formal manner & evaluate them for consistency with existing information.

* Bard and Rhee, Nature Reviews Genetics, Vol 5, March 2004, pg 213

PENNSTATE Hypotheses and Events

An hypothesis about a biological process is a statement about relationships within a biological system.

Protein P induces transcription of gene X

We define an ‘event’ as a relationship between two biological entities, which we call ‘agents’.

PENNSTATE Testing events

Protein P induces transcription of gene X

promoter | gene X promoter | gene X

nucleusnucleus

PP

Implicit claims (that can made explicit):

1. P is a transcription factor.

2. P is a transcriptional activator.

3. P is localized to the nucleus.

4. P can bind to the promoter of gene X

PENNSTATE Hypothesis Ontology

• Expressive enough to describe the galactose system at a coarse level of detail.

• It is compatible with other ontology efforts.– E.g. GO so that GO annotations

can be used directly in HyBrow.

• We have also developed a grammar to write hypotheses using events from this ontology.

PENNSTATE Grammar for a hypothesis

A hypothesis consists of at least one event stream

An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them.

An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened.

A logical joint is the conjunction between two event streams.

PENNSTATE

Making Hypotheses with increasing ‘formality’

1. Controlled Vocabulary2. Formal Language3. Context-Free

Grammar

We have developed a formal language & grammar for representing an hypothesis as a sequence of events.

We use ‘constraints and rules’ to decide if an hypothesis is a valid production of the language.

The mathematical representation

A biological event is any occurrence for which we gather experimental data.

Hypotheses make testable statements about combinations of biological events.

http://conferences.computer.org/bioinformatics/CSB2003/SectA.html#Poster9

PENNSTATE Constraints and Rules

• Consistency of an hypothesis with prior knowledge is evaluated by applying constraints and rules.

• A constraint is a statement specifying the evidence that contradicts or supports an event.

• A protein must be in the nucleus to bind to a promoter.

• A rule comprises the ‘steps’ for deciding whether a constraint is satisfied or violated.

Binds_to_promoter [P, g]

:

Annotation constraintsif cellular location of P is not nucleus, give a penalty.if biological process is not transcription, give a penalty.

PENNSTATE A point-n-click interface

PENNSTATE Visual language representation

Uses a formal Visual Language:1. Direct composition of

hypotheses in a format akin to reaction pathway diagrams

2. Translatable to other representation forms

PENNSTATE Other notations:

Cook Notation -- BioD Kohn Notation

PENNSTATE Multiple ‘views’ of the ontology

• Once we have an ontology for hypotheses … it can be represented as

– Text files that users type.– As formal constructs that can be evaluated for validity in a

formal manner.– As files that are ‘browsed’ by using special programs.

• Having such equivalent formats allows us to perform computer aided hypothesis-evaluation.

PENNSTATE Multiple equivalent representations

Biological process described in a formal language

ev0 = Gal2p transports galactose in mem in wt

ev1 = galactose activate Gal3p in wt in cyt

ev2 = Gal3p Binds_to_promoter gal1 in wt in nuc

ev3 = Gal3p induce gal1 in presence_of galactose in wt in nuc

hy1 = (ev0+ev1) and (ev2+ev3)

XML format?

PENNSTATE Evaluating an hypothesis

Demo

Inference rules

Event Handler

Justification routines

Neighboring events generator

Hypothesis parser and ranking rules

Result formatter

Visual Widget

Hypothesis file

Browser

User

Database

PENNSTATE Screen shot of the output

n1 b1

C. Plot of the support verses conflicts for submitted and neighboring hypotheses (n1, b1). Clicking on the n1 submits that hypothesis as ‘seed’

A. Representation of an hypothesis in terms of events (ev = event)

B. Holding the mouse on a neighboring hypothesis (b1) shows what event was replaced to create it

n1 b1

C. Plot of the support verses conflicts for submitted and neighboring hypotheses (n1, b1). Clicking on the n1 submits that hypothesis as ‘seed’

A. Representation of an hypothesis in terms of events (ev = event)

B. Holding the mouse on a neighboring hypothesis (b1) shows what event was replaced to create it

PENNSTATE Credits

• Stephen Racunas– sar147@psu.edu

• Nina Fedoroff (Mentor)– nvf1@psu.edu

More on project website:

www.hybrow.org &

Aug 1st @ 11:10 AM.