Constructing Complex Queries
in Pathway Toolsusing Emacs, Lisp, and
Perl
Randy Gobbel, Ph.D.
May 14, 2003
SRI InternationalBioinformaticsOverview
Why would you need to write complex queries?
Emacs
Lisp
perlcyc
The GFP API, and Pathway Tools-specific functions
Examples and exercises
SRI InternationalBioinformaticsWhen do you need complex
queries?
Many common queries are accessible from the command menu
By name By substring By class Others are specialized by the type of the object being
displayedOther queries of arbitrary complexity can be
created by writing a (simple) program Example: find all reactions with more than 5 citations
SRI InternationalBioinformaticsProgrammatic Access to PGDBs
LISP and PERL languages used for programmatic queries and updates to PGDBs
Generic Frame Protocol (GFP) is API for PGDBs
SRI InternationalBioinformaticsEmacs
“The extensible, self-documenting editor”(Most of the time) typing a printing character
simply inserts it Just like most Windows and MacOS programs
Control and Meta keys in combination with other keys run commands
Again, just like keyboard shortcuts in most programsControl-H: Help
T -> tutorial, A -> apropos, W -> “where is <command>” K -> “what does this key combination do?”
Many commands are now available from pulldown menus
SRI InternationalBioinformaticsEmacs
Three ways to run Pathway Tools from within Emacs Use the Emacs/Lisp interface provided with Allegro
Common Lisp (fi) Use the free ILisp package (wriitten in Emacs Lisp) Run Pathway Tools from a shell within Emacs Windows users: lowest-common-denominator
Cut and paste still works Advantages of using Emacs with Lisp
Syntax highlighting Automatic indentation One-keystroke evaluation of Lisp forms in fi and ilisp
SRI InternationalBioinformaticsLisp
An idea that keeps reinventing itself Function, arguments
What is a list? Unit of syntax: (a b c) Unit of data: (a b c) Unit of execution: (get-slot-value ‘arca ‘citations)
Most languages: function(arg1, arg2, …) Fine for writing
Lisp: (function arg1 arg2 arg3 …) Much easier to deal with in a computer
SRI InternationalBioinformaticsLisp Data Types
Numbers 1 1.325
Strings “hello”
Symbols E.g.: ARCA (or, arcA) Make a literal symbol by quoting it: ‘ARCA Case-sensitive symbols require vertical bars: ‘|Genes|
Special symbols: T and NIL Used to mean True and False NIL is also the empty list: ()
SRI InternationalBioinformaticsLisp Expressions and Evaluation
(+ 3 4 5) ‘+’ is a function (+ 3 4 5) is a function call with 3 arguments
Arguments are evaluated: Numbers evaluate to themselves If any of the args are themselves expressions, they are also evaluated (+ 1 (+ 3 4)) 8
The values of the args are passed to the function Some functions allow variable numbers of arguments
(+) 0 (+ 1) 1 (+ 2 3 1 3 4 5 6) 24
(+ (* 3 4) 6) 18
SRI InternationalBioinformaticsLisp Expressions and Evaluation
Also called “top level” and “read-eval-print loop” Uses a three-step process
Read Reader converts elements outside “” and || to uppercase
Evaluate Print
Anything you type in is evaluated 1 1 “hello” hello (+ 2 3) 5
Quoting prevents evaluation ‘(+ 2 3) (+ 2 3)
Setting a symbol to a value creates a variable: (setq foo ‘(a b c)) (a b c) foo (a b c) No declarations required!
SRI InternationalBioinformaticsThe Lisp Listener
Useful forms in listener: Previous Results: *, **, *** But: not in programs
(+ 1 2) 3(+ 3 *) 6** 3
SRI InternationalBioinformaticsDealing with the Lisp debugger
Error conditions result in a call to the Lisp debugger:
:continue continues, a numeric argument selects between possible options
Lower-numbered options generally take less drastic actions :reset unwinds to the top level
WARNING: may exit the Pathway Tools window! :zoom displays the stack
EC(4): (xxx)*debugger-hook* called.Error: Attempt to take the value of the unbound variable `X'. [condition type: UNBOUND-VARIABLE]
Restart actions (select using :continue): 0: Try evaluating X again. 1: Use :X instead. 2: Set the symbol-value of X and use its value. 3: Use a value without setting X. 4: Return to Top Level (an "abort" restart). 5: Abort entirely from this process.[1] EC(5): :res
SRI InternationalBioinformaticsLisp Variables
Global variable values can be set and used during a session
Declarations not needed
(setq x 5) 5x 5(+ 3 x) 8(setq y “atgc”) “atgc”
SRI InternationalBioinformaticsEquality in LISP
Internally LISP refers to objects via pointers Fundamental equality operation is EQ
True if the two arguments point to the same object Very efficient
Other comparison operators: = for numbers: (= x 4) EQUAL for list structures or exact string matching: (equal x “abc”) STRING-EQUAL for case-insensitive string matching: (string-equal x
“AbC”) EQL for characters: (eql x #’\A) EQ for list structures or symbols (compares pointers): (eq x ‘ABC) FEQUAL for frames: (fequal x ‘trp)
Simple rule: Use EQUAL for everything except frames
SRI InternationalBioinformaticsFunctions for Operating on Lists
length (length x) Returns the number of elements
first (first x) Returns the first element
nth (nth j x) Returns the Jth element of list X (element 0 is the first element)
SRI InternationalBioinformaticsloop
Loop allows you to iterate Through a series of numbers
for i from 1 to 10 Through a list
for rxn in rxnsConditionals control whether execution continues
when (> (length (get-slot-values rxn ‘citations)) 5)
do lets you do something do (+ i total)
collect lets you gather up values collect (get-frame-name rxn)
SRI InternationalBioinformaticsloop
You can combine as many loop clauses as you need:
(loop
for i from 1 to 10
for j from 10 downto 1
do (print (+ i j))
collect (* i j))
(10 18 24 28 30 30 28 24 18 10)
SRI InternationalBioinformaticsDefining Functions
Put function definitions in a file Reload the file when definitions change
EC(1): :ld my-queries.lisp
(defun <name> (<arguments>) … code for function …)
Creates a new operation called <name>
Examples:(defun square (x) (* x x))
(defun message () (print “Hello”))
(defun test-fn () 1 2 3 4)
SRI InternationalBioinformaticsAccessing Lisp from Pathway
Tools
Starting Pathway Tools for Lisp work:
> pathway-tools –lisp
EC(1): (select-organism :org-id ‘XXX)
Windows: pathway-tools-lisp.exe
Lisp expressions can be typed at any time to the Pathway Tools listener
Command: (get-slot-value ‘trp ‘common-name) “L-tryptophan”
Invoking the Navigator from Lisp:
EC(2): (eco)
SRI InternationalBioinformaticsThe perlcyc API
Written by Lukas Mueller at TAIR Downloadable from the TAIR Web site Installs as a standard CPAN module From within Pathway Tools, start the server by hand:
(start-external-access-daemon) (start-external-access-daemon :verbose? t) for
tracing output Function names are the same as Lisp, with hyphens
replaced by underscores, question marks by _p get-class-all-instances get_class_all_instances coercible-to-frame? coercible_to_frame_p
Pathway Tools functions are callable as standard Perl functions
Frame names are symbols which can be passed back to Lisp
Control structures are standard Perl
SRI InternationalBioinformaticsjavacyc
Uses the same Unix domain socket interface as perlcyc
Function names use Java conventions Get-slot-values getSlotValues
Includes a C library for Unix domain sockets
SRI InternationalBioinformaticsLisp vs. Perl
Task: find all reactions with fewer than 5 citations
Perl:use perlcyc;my $cyc = perlcyc->new(“ECOLI");my @found;foreach $r ($cyc->all_rxns()){ my @citations = get_slot_values($r, “citations”); if (scalar(@citations) < 5) { push @found, $r;}
Lisp:(loop for r in (all-rxns) when (< (length (get-slot-values r ‘citations)) 5) collect r)
SRI InternationalBioinformaticsPathway Tools User Accessible
Functions
Internal Pathway Tools functions that users can call
Includes: Generic Frame Protocol (GFP), the Ocelot object database
API Additional functions specific to Pathway Tools
For more information see http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
SRI InternationalBioinformaticsGeneric Frame Protocol (GFP)
A library of Lisp functions for accessing Ocelot DBs
GFP specification: http://www.ai.sri.com/~gfp/spec/paper/paper.html
A small number of GFP functions are sufficient for most complex queries
SRI InternationalBioinformaticsGeneric Frame Protocol
(get-class-all-instances Class) Returns the instances of Class
Key Pathway Tools classes: Genetic-Elements Genes Proteins Polypeptides (a subclass of Proteins) Protein-Complexes (a subclass of Proteins) Pathways Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites
SRI InternationalBioinformaticsGeneric Frame Protocol
Note: Frame.Slot means a specified slot of a specified frame Frame and Slot must be symbols!
(get-slot-value Frame Slot) Returns first value of Frame.Slot
(get-slot-values Frame Slot) Returns all values of Frame.Slot as a list
(slot-has-value-p Frame Slot) Returns T if Frame.Slot has at least one value
(member-slot-value-p Frame Slot Value) Returns T if Value is one of the values of Frame.Slot
(print-frame Frame) Prints out the contents of Frame
SRI InternationalBioinformaticsMore useful functions
(coercible-to-frame-p Thing) Returns T if Thing is the name of a frame, or a frame object
(save-kb) Saves the current KB
(replace-answer-list <list of frames>) Makes the specified frames browseable via the Pathway
Tools GUI
SRI InternationalBioinformaticsGeneric Frame Protocol –
Update Operations
(put-slot-value Frame Slot Value) Replace the current value(s) of Frame.Slot with Value
(put-slot-values Frame Slot Value-List) Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values
(add-slot-value Frame Slot Value) Add Value to the current value(s) of Frame.Slot, if any
(remove-slot-value Frame Slot Value) Remove Value from the current value(s) of Frame.slot
(replace-slot-value Frame Slot Old-Value New-Value) In Frame.Slot, replace Old-Value with New-Value
(remove-local-slot-values Frame Slot) Remove all of the values of Frame.Slot
SRI InternationalBioinformatics
Additional Pathway Tools Functions –Semantic Inference LayerSemantic inference layer defines built-in
functions to compute commonly required relationships in a PGDB
http://bioinformatics.ai.sri.com/ptools/ptools-fns.html
SRI InternationalBioinformaticsGKB editor
GUI for browsing the frame hierarchy Command: Special Taxonomy Viewer
View Browse Class Hierarchy (ctrl-B)Allows viewing of classes, slots, and instances
You can’t write a query unless you know the exact class and slot names
Class names are usually case-sensitive symbols |Genes|, |Proteins|, …
SRI InternationalBioinformaticsLISP and GFP References
Common LISP, the Language -- The standard reference
Paper edition by Guy Steele Online version
http://www.lispworks.com/reference/HyperSpec/Front/index.htm
Information on writing Pathway Tools queries: http://bioinformatics.ai.sri.com/ptools/ptools-resources.html http://www.ai.sri.com/pkarp/loop.html http://bioinformatics.ai.sri.com/ptools/debugger.html
SRI InternationalBioinformaticsPathway Tools information Web
site
Top top-level page http://www.biocyc.org/
General Pathway Tools information http://bioinformatics.ai.sri.com/ptools/
How to submit a bug report http://bioinformatics.ai.sri.com/ptools/bug.html
Writing queries, introductions to Lisp, etc. http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
SRI InternationalBioinformaticsExamples
(select-organism :org-id ‘ecoli) ECOLI
(setq genes (get-class-all-instances ‘|Genes|))
(……………)(setq monomers (get-class-all-instances ‘|Polypeptides|))
(…………….)(setq genes2 genes) (…………….)
SRI InternationalBioinformaticsProblems
all-substrates
enzymes-of-reaction
genes-of-reaction
genes-of-pathway
monomers-of-protein
genes-of-enzyme
SRI InternationalBioinformaticsExample Session
(setq x ‘trp) trp
(get-slot-value x ‘common-name) “L-tryptophan”
(setq aas (get-class-all-instances ‘|Amino-Acids|))
(……..)
(loop for x in aas count x) 20
SRI InternationalBioinformaticsExample Session
(loop for x in genes for name = (get-slot-value x ‘common-name) when (and name (search “trp” name)) collect x)) (…)
(setq rxns (get-class-all-instances ‘|Reactions|)) (…)
(loop for x in rxns when (member-slot-value-p x ‘substrates ‘trp)
collect x) (…)(replace-answer-list *)
SRI InternationalBioinformaticsExample Session
(setq x ‘(trp arg))
(TRP ARG)
(replace-answer-list x)
(TRP ARG)
(eco)
SRI InternationalBioinformaticsHow to write a good bug report
Use dribble-bug (excl:dribble-bug “bug.txt”) to start dribbling (excl:dribble-bug) to stop
How to get out of the debugger :bt – short backtrace of what functions are being called :zoom – more detailed trace :cont <n> - continue. Lower numbers are less drastic
Be specific, and as detailed as you can stand What button/key did you push? Which screen/editor were you using at the time? What object were you viewing/editing?
Try to find a reproducible test case if at all possible!
SRI InternationalBioinformaticsHow to use autopatch
Patches load automatically on startup, or--Special Install Patches
Download and install Or simply install
Goes to our Web server gets patches, and installs them
Restarting is usually not required Functions are redefined on the fly But: if the patch involved initialization, you might need to
restart