Cytoscape: Now and Future

Post on 07-Jul-2015

841 views 3 download

Tags:

description

Presentation slides for Ideker Lab meeting on 12/1/2014. Briefly describes current status of the projects and future vision.

transcript

Cytoscape: Now and Future

Keiichiro Ono UCSD Trey Ideker Lab Cytoscape Core Team Lab Meeting 12/1/2014

NowFuture

Now: Current Status

- Overall Project Status

- Cytoscape 3.2

- cyREST

Future

Service-oriented workflow

Cytoscape-as-a-service

DockerReproducible Science

Web Technologies

NIH The Commons

Status

> 700 downloads/month

App Store Daily Download Plot

Project Status

Cytoscape Core: Apps:

Publications:

?

Status: Cytoscape Core

Cytoscape 3.2

Cytoscape 3.2

- Released early November

- What’s New?

- Chart Editor

- Export as Web Application - Performance Improvements

- Lots of bug fixes

Chart Editor

Chart Editor

Chart Editor- Visualize multiple data points

to a single view

- Time series data

- Multiple GO terms

- Chart types: Bar, Box, Pie, Heat Map, Ring

- Part of standard Visual Style Editor

- Everything will be saved into session files

Gradient Editor

Export as Web Application

Export as Web Application

Exporting Cytoscape-generated visualizations as a complete web application using Cytoscape.js

cyREST

Users

User Type I- Average computing skills

- Use Excel as their primary workbench for data analysis

- For them, bioinformatics means using some of NCBI/EBI web tools or DAVID

- Have tons of data not analyzed / visualized yet

- Excel is my friend.

User Type II- Advanced computing skills

- Use Python + SciPy /NumPy, R + Bioconductor, or MATLAB every day

- If necessary, write their own packages

- Use HPC technologies a lot

- Manual operation is evil.

Both of them are Important!- Type I: “Bench Biologists”

- Domain experts

- Data producers

- Type II: Computational Biologists

- Experts of large-scale data analysis

- Especially important for genome-scale data analysis

They are ignored for a long time in Cytoscape world…

User Type II - Advanced computing skills

- Use Python + SciPy /NumPy, R + Bioconductor, or MATLAB every day

- If necessary, write their own packages

- Use HPC technologies a lot

- Manual operation is evil.

Requests from Type II Users- I have 200 networks in my session and I need to create

one PDF per view. How can I do it with Cytoscape?

- I need to use igraph for network analysis, but its visualization feature is limited. I want to use Cytoscape as an external visualization engine for R.

- Usually I use IPython Notebook to record my work. How can I integrate Cytoscape into my workflow?

- I want to generate Style for each time point and create small multiples of networks.

REST

What is cyREST?

- Platform-independent, RESTful API module for Cytoscape

- Means you can access basic Cytoscape data objects programmatically

REST

Interactive Data Analysis Environments

In-House Databases External Computing Resources

- Graph Layout- Statistical Analysis- Data Pre-processing

RStudio

- NumPy- SciPy- Pandas- NetworkX

IPython Notebook

File / Code Hosting ServicesPublic Data Repository

PSICQUIC Services

EBI RDF Platform

Other Bioinformatics Web Applications / Services

- igraph- rCurl

Command Line Tools

> sed> awk> grep> curl

Web Browsers

Data Repository & Collaboration Service

Data Bus (Internet)

Your Workstation

Cytoscape App Store

Cytoscape Desktop

Apps

Core

REST

REST

Cytoscape 3.1+Clients

POST

PUT

DELETE

GET

Mapping Cytoscape API to HTTP Methods

Create

Read

Update

Delete

Cytoscape Operations

POST

GET

PUT

DELETE

HTTP Methods

Get full network with unique ID 52 as JSON

GET http://localhost:1234/v1/networks/52

http://localhost:1234/v1/networks/52

Demo: Cytoscape Controlled

from IPython NotebookREST

http://bit.ly/1wcKXVV

Ready to Use Now!

REST

http://apps.cytoscape.org/apps/cyrest

Future

History

2005

2005

- Cytoscape 2.2: Simple Java Application

- Google released an application called Google Maps beta

- “Re-discovery” of JavaScript, or Ajax

2014

2014- Cytoscape 3.2.0: (Modularized) Java Application

- Client applications are migrating to the web browsers

- “Pure” desktop applications are dying slowly…

- Even desktop applications depend on eternal services

- JavaScript everywhere

- Cloud Computing

- Scale-out over scale-up

Trend in Software Design

- An application is a collection of smaller services

- JavaScript is a first-class citizen in the world of programming languages

- Design application with cloud services in mind

http://12factor.net/

In the modern era, software is commonly delivered as a service: called web apps, or software-as-a-service. The twelve-factor app is a methodology for building software-as-a-service apps that:

• Use declarative formats for setup automation, to minimize time and cost for new developers joining the project

• Have a clean contract with the underlying operating system, offering maximum portability between execution environments

• Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration

• Minimize divergence between development and production, enabling continuous deployment for maximum agility

• And can scale up without significant changes to tooling, architecture, or development practices.

This MANIFESTO counters current trends in bioinformatics where institutes and companies are creating monolithic software solutions aimed mostly at end-users.

Let’s see what’s happening in (scientific) computing…

Bioinformatics Open Source Conference (BOSC)

@New York Times@Facebook HQ

in Boston

What I Have Learned…- Python is becoming the standard

language for “Data Scientists”

- Python itself is a very slow language, but is a perfect glue

- Lots of tools are made by scientists (e.g. Anaconda by Continuum)

- They do understand current problems in modern scientific computing, and trying to solve them

What I Have Learned…- Data visualization

- Visualization needs varies, especially for complex data sets like the one from life science domain

- For that purpose, Java is not the best language to implement applications

- Even large-scale data visualization applications are moving to the web browsers

- Canvas (Cytoscape.js), WebGL (Three.js), SVG (D3.js)

- Most of the talented hackers are working on the web browsers, i.e., JavaScript

WikiGalaxy: http://wiki.polyfra.me/#

Problems in Scientific Computing- No more free lunch

- Even if you buy expensive machines, you cannot get free performance gain anymore. You have to design your code for massively distributed environment. (From Scale-up to Scale-out)

- Complex Data Analysis Pipeline

- Need to build pipeline by connecting multiple resources, or services

- Needs for complex, customized data visualization

- Reproducibility

➡ But building, deploying, and maintaining reproducible pipeline is not straight-forward

What does this mean to biologists?

- “Omics-Scale" Data Analysis

- Need computing power beyond your workstations

- Need to build pipelines by connecting multiple resources, or services

➡ But developing, deploying, and maintaining reproducible, or “portable” pipeline is not straight-forward

What does this mean to biologists?

- Collaboration between scientists and software engineers will be more important

- Scientists should spend their time on science, not the details of JavaScript syntax or how to build large scale pipeline

- In other words, building bioinformatics computing environment itself is a research project

What does this mean to Cytoscape team?

- Cytoscape should work nicely with other tools

- All bioinformatics tools should work as a building block of large workflows

- In a long term, Cytoscape should be a collection of services

Universe of Tools for Bioinformatics

!

Cytoscape as a Collection of Services

Case Study 1

PANGIA App

Srivas, Rohith et al. “Assembling Global Maps of Cellular Function through Integrative Analysis of Physical and Genetic Networks.” Nature Protocols 6.9 (2011): 1308–1323. PMC. Web. 1 Dec. 2014.

Core algorithm 1 as Python

Java Implementation of Algorithms

Cytoscape 2.x Plugin

Biological Problem

Cytoscape 3.x App

Core algorithm 2 as Python

Core algorithm n as Python

PanGIA Service(Implement in Python again…?)

by Sourav

by Greg, Rohith

by Greg, Rothith and Cytoscape Team

by David

History of PanGIA Application

Lots of Duplicate Efforts!

Case Study 2

NeXO Web

NeXO Web

- Term Enrichment Analysis

- From list of genes, perform hypergeometric test over set of machine-generated ontology (NeXO) terms and display terms with p-values

- It is independent from all other parts of NeXO Web application

Term Enrichment Service API by Flask

Python CoreSciPy

NumPy

Overview of NeXO Term Enrichment Service

NeXO Web RESTful API

Term Enrichment Service API by Flask

Python CoreSciPy

NumPy

Overview of NeXO Term Enrichment Service

NeXO Web RESTful API

Option 1: As a Cytoscape App- Re-implement this algorithm as a Cytoscape App

(Java Application)

- Pros:

- Easy to install

- Cons:

- A lot of work…

- Should be written in Java

- Does not scale-out!

Option 2: As a Service- Wrap existing applications and deploy to platform of users’ choice:

- Laptops, private servers, and commercial cloud services (AWS/Google Computing Cloud, etc.)

- Pros:

- Scales-out

- Client-independent

- Workflow-friendly

- Cons:

- Need to adopt to the new way of software design

- Relatively more complex deployment

Summary

- Best practice: for future applications, implementing them as services and then call them from Cytoscape, IPython, RStudio, and other tools

- To make your algorithms available to both Type I (domain experts) and Type II (hardcore computational biologists) users, it is better to deploy them as a service, instead of an App

Does technology available to implement such applications / workflows?

Yes!

Key: Provenance

Data Workflow Environment

Data Workflow

Environment

Data

Workflow Environment

Data

Workflow

Environment

Software Distribution Problem

- “It-worked-on-my-machine” syndrome

- This is a serious problem especially when you want to share your workflow with collaborators.

What is Docker?

- Container to run applications in an isolated environment

- Application = Layer of images

- Sharable Environments

- Environments as code

https://www.docker.com/whatisdocker/

Docker Hub

- Sharing environments as code!

- Dockerfile - Definition of your container

- Example: http://bit.ly/15N23P8

Goal: Reproducible Science

We (the NIH) Are Working On, But As Yet Do Not Have Good Answers To:

1. Today, how much are we actually spending on data and software related activities?

2. How much should we be spending to achieve the maximum benefit to biomedical science relative to what we spend in other areas?

Biomedical Research as an Open Digital Enterprise by Philip E. Bourne Ph.D.Associate Director for Data Science (NIH)

Reproducibility

!  Most of the 27 Institutes and Centers of the NIH are currently reviewing the ability to reproduce research they are funding

!  The NIH recently convened a meeting with publishers to discuss the issue – a set of guiding principles arose

Biomedical Research as an Open Digital Enterprise by Philip E. Bourne Ph.D.Associate Director for Data Science (NIH)

The Cytoscape to a Cytoscape- Shares Core Concepts

- Graph Model

- Table associated with graph

- Style (Collection of visual mappings)

- Implemented as different collection of services

- Desktop Cytoscape

- Interactive network data visualizer on the web

- Optimized for ontology browsing (i.e., future version of NeXO Web)

• https://flic.kr/p/bFZpyg

• https://flic.kr/p/bmXUz1

Photo Credits

2014 Keiichiro Ono kono@ucsd.edu