Agenda VAPrototype GenomeServices iPlant API 1. GUIGUI Design info/file GEO# GOterms of interest...

Post on 16-Dec-2015

214 views 1 download

Tags:

transcript

1

Agenda

VAPrototype

GenomeServices

iPlant API

G

U

I

•Design info/file•GEO#•GOterms of interest

•GenExpr2Ddata•GenExprSum•ContrastFiles•EnrichedGO graphs•GeneMANIA output

AffyGenAnalyser

BiNGO

GOAnalyser

Enriched GOterms

GeneMANIA

GeneLists

GenAV Analysis & Visualization of Affymetrix Gene Expression Data

User

Output(text files,

graphs)

3

LIVE DEMO

4

Working group-led iteration and discussion (Jan-June)

•Componentization•Reusability•Identification of potential GUI representations for work products

Summer Supercomputing Institute Meeting (July)

•Refinement of workflow•Identification of entry and exit point•Iteration on GUI representations•Cyberinfrastructure-oriented design•Implementation decisions

• Technology/language• Work allocation

How did we get here?

5

Expression AnalysisBioConductor limma

•Retrieve data•Specify experiment design•Normalize (gcRMA)•Linear model fit•Bayesian correction•Hypothesis testing•Emit results

NCBI GEO

iPlant Data Storage API

•Limma is a standard module for expression analysis

•Limma incorporates translation and integration code to handle most common array platforms

•Limma writes verbose but consistent delimited results

•People know how to use BioC/Limma and can do so on their desktop systems

Entry point is user upload expression file into the iPlant Data API

6

VAPrototype

•Retrieve data via /data API•Iterate over experiments•Perform category enrichment•Consolidate results•Return as JSON data structure

http://medea/iplant/js/application.js

1. Invoke VAPrototype via iPlant /jobs API2. Poll for service to complete3. Fetch results as JSON4. Render to dynamic table5. Interpret user interactions

Lecong.cgi

•Accept gene list•Accept control list•Accept parameters•Run analysis using call to R•Return JSON data structure

iPlant Jobs API

iPlant Data API

R/Bioconductor/HyperGO

7

http://medea/iplant/js/application.js1. Interpret user interactions

1. Sorting2. Downloading3. Invoke Network Analysis service

via iPlant /jobs API4. Poll /jobs for completion5. Fetch results (GraphML)6. Render in Cytoscape Web

BuildNetwork

•Accept gene list•Accept parameters (species, etc)•Accept algorithm name (GeneMania)•Invoke GeneMania plugin (Java) to predict network•Convert all gene names to AGI codes•Convert domain-specific report to GraphML

iPlant Jobs API

iPlant Genome ServiceAPI

GeneMania

8

9

10

What’s next

•VAPrototype won’t see any explicit additional development since it is a proof of principle•We need to focus on delivering robust versions of the functions that are mocked up•It serves as a reference implementation for a 3rd party DE•It also illuminates specific data integration needs•We may use it as a testing ground for new ideas in GUI, service coordination, and API design•It will be ported to use the full implementation of the iPlant API and used as an example for potential developers

• Web application portion: 1 day• Web services: 1 week

11

Genome Services

Why is this needed? This is G2P not genomics!

•Support multiple genomes in UHTS services

•Support germplasms and natural accessions

•Pave the way to supporting user genomes

•Make best use of existing resources

•Sane, authority-led approach to data integration

Current Ideas•Return a structured list of taxonomic identifiers (Genus, species, version, germplasm/accession) supported by iPlant•Given a genus, species, version, and germ plasm/accession identifier:

• Return a URI pointing to a multiple-FASTA containing the genome sequence

• Return a URI pointing to a GFF3 version of the genome annotation

• Return a URI pointing to a GTF version of the genome annotation

• Return a URI pointing to the dummy expr files needed by Cufflinks for RNAseq

• Be able to actually return the files referenced by these URIs for download

•Given the taxonomic identifier plus a name or synonym of a gene

• Return an authoritative name for said gene•Given the taxonomic identifier plus a microarray platform name plus a probe identifier:

• Return the canonical gene name mapped to that microarray probe

iPlant Genome Services API

Clade-specific data authorities

NCBI and EBI

Local Knowledge

Mirroringrelationships

Genome Services

iPlant Genome Services API

Clade-specific data authorities

NCBI and EBI

Local Knowledge

Mirroringrelationships

Direct relationships

Indirect relationships

(CoGE)

Taxonomic Name

Resolution Service (TNRS)

Discovery Environment

TAIR

Gramene

Phytozome

Etc.

14

The iPlant API

The iPlant API will support the following use cases:

1. I have a command-line tool that performs a specific type of bioinformatics analysis and I want to make it available to others.

2. I have a web service that performs a specific type of bioinformatics analysis and I want to make it available to others.

3. I have a web site that people can use to perform analyses and I want to make it available to others.

4. I want to write an web application that chains multiple types of tools together.

5. I want to use a workflow manager like Taverna or Kepler to orchestrate a set of analytical steps.

Architecture

Core Services

• Eventing• I/O• Data Transforms• App Discovery• Job Mgmt.

• User Profile Mgmt.

• Authentication• User/Project

Auditing• Mashups

(Orchestration)

I/O Services

Getting raw data into and out of the iPlant CI and moving data around internally

• /io: upload files and stage URIs (http, https, ftp, sftp, gsiftp, jdbc, amazon s3, irods)

• /io/list: list iPlant files• /io/<file_id>: download, delete file

Job Management Services

Submitting and managing jobs to run supported applications as well as querying for historical information about jobs

• /job: submitting a job• /job/history: historical job history• /job/<job_id>: kill an active job or get information about a job• /job/<job_id>/input/list: get a listing of the input files associated with a

specific job• /job/<job_id>/input/<file_id>: retrieve a specific input file in the format

it was in when the job ran• /job/<job_id>/output/list: get a listing of the output files associated with

a specific job• /job/<job_id>/output/<file_name>: retrieve a specific output file

associated with the job

Application Discovery Services

Application discovery and management (different from semantic web service discovery)

• /apps: add a new application to the iPlant CI• /apps/list: list all supported applications• /apps/search: search for a specific application• /apps/type/list: list all supported application types• /apps/type/<type_name>: list all supported applications

of a specific type• /apps/name/<app_name>: list all supported applications

matching a given name