+ All Categories
Home > Technology > The power of graphs to analyze biological data

The power of graphs to analyze biological data

Date post: 05-Dec-2014
Category:
Upload: datablend
View: 2,007 times
Download: 0 times
Share this document with a friend
Description:
The power of graphs to analyze biological data
45
the power of graphs for analyzing biological datasets Davy Suvee Janssen Pharmaceutica
Transcript
Page 1: The power of graphs to analyze biological data

the power of graphs for analyzing biological datasets

Davy Suvee

Janssen Pharmaceutica

Page 2: The power of graphs to analyze biological data

about me

➡ working as an it lead / software architect @ janssen pharmaceutica• dealing with big scientific data sets

• hands-on expertise in big data and NoSQL technologies

who am i ...

Davy Suvee@DSUVEE

➡ founder of datablend• provide big data and NoSQL consultancy

• share practical knowledge and big data use cases via blog

Page 3: The power of graphs to analyze biological data

outline

➡ getting visual insights into big data sets

➡ fluxgraph, a time machine for you graphs ...

★ gene expression clustering (mongodb, Neo4j, Gephi)★ Mutation prevalence (cassandra, Neo4j, Gephi)

Page 4: The power of graphs to analyze biological data

insights in big data

➡ typical approach through warehousing★ star schema with fact tables and dimension tables

Page 5: The power of graphs to analyze biological data

insights in big data

➡ typical approach through warehousing★ star schema with fact tables and dimension tables

Page 6: The power of graphs to analyze biological data

insights in big data

★ real-time visualization★ filtering★ metrics★ layouting★ modular 1, 2

1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin

Page 7: The power of graphs to analyze biological data

gene expression clustering

★ 4.800 samples★ 27.000 genes

➡ oncology data set:

➡ Question:★ for a particular subset of samples, which genes are co-expressed?

Page 8: The power of graphs to analyze biological data

mongodb for storing gene expressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,  "sample_name" : "122551hp133a21.cel" ,  "genomics_id" : 122551 ,  "sample_id" : 343981 ,  "donor_id" : 143981 ,  "sample_type" : "Tissue" ,  "sample_site" : "Ascending colon" ,  "pathology_category" : "MALIGNANT" ,  "pathology_morphology" : "Adenocarcinoma" ,  "pathology_type" : "Primary malignant neoplasm of colon" ,  "primary_site" : "Colon" ,  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                     … ]}

Page 9: The power of graphs to analyze biological data

pearson correlation through map-reduce

pearson correlation

x y

43 99

21 65

25 79

42 75

57 87

59 81

0,52

Page 10: The power of graphs to analyze biological data

co-expression graph

➡ create a node for each gene➡ if correlation between two genes >= 0.8, draw an edge between both nodes

Page 11: The power of graphs to analyze biological data

co-expression graph

Page 12: The power of graphs to analyze biological data

graphs and time ...

➡ fluxgraph: a blueprints-compatible graph on top of Datomic

➡ make FluxGraph fully time-aware ★ travel your graph through time★ time-scoped iteration of vertices and edges★ temporal graph comparison

➡ towards a time-aware graph ...

➡ reproducible graph state

Page 13: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Page 14: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Davy

Page 15: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Davy

Peter

Vertex peter = ...

Page 16: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Michael

Davy

Peter

Vertex peter = ...Vertex michael = ...

Page 17: The power of graphs to analyze biological data

travel through time

FluxGraph fg = new FluxGraph();

Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);

Michael

Davy

Peter

Vertex peter = ...Vertex michael = ...

Edge e1 = fg.addEdge(davy, peter,“knows”);

knows

Page 18: The power of graphs to analyze biological data

travel through time

Date checkpoint = new Date();

Michael

Davy

Peter

knows

Page 19: The power of graphs to analyze biological data

travel through time

Date checkpoint = new Date();

davy.setProperty(“name”,”David”);

Michael

Davy

Peter

knows

Page 20: The power of graphs to analyze biological data

travel through time

Date checkpoint = new Date();

davy.setProperty(“name”,”David”);

Michael

Peter

knows

David

Page 21: The power of graphs to analyze biological data

travel through time

Date checkpoint = new Date();

davy.setProperty(“name”,”David”);

Michael

Peter

Edge e2 = fg.addEdge(davy, michael,“knows”);

knows

David

knows

Page 22: The power of graphs to analyze biological data

travel through time

Michael

Davy

Peter

DavidDavy

Peter

knows

knows

Michael

knows

checkpoint

currenttime

by default

Page 23: The power of graphs to analyze biological data

travel through time

Michael

Davy

Peter

DavidDavy

Peter

knows

knows

Michael

knows

checkpoint

currenttime

fg.setCheckpointTime(checkpoint);

Page 24: The power of graphs to analyze biological data

tcurrrentt3t2

time-scoped iteration

change change change

Davy’’’Davy’ Davy’’

t1

Davy

➡ how to find the version of the vertex you are interested in?

Page 25: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Page 26: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();

Page 27: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Page 28: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);

Page 29: The power of graphs to analyze biological data

next next next

previouspreviousprevious

tcurrrentt3t2

time-scoped iteration

Davy’’’Davy’ Davy’’

t1

Davy

Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();

Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);Interval valid = davy.getTimerInterval();

Page 30: The power of graphs to analyze biological data

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

Page 31: The power of graphs to analyze biological data

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

➡ edge:★ setting or removing a property ★ being removed

Page 32: The power of graphs to analyze biological data

time-scoped iteration

➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed

➡ When does an element change?

➡ edge:★ setting or removing a property ★ being removed

➡ ... and each element is time-scoped!

Page 33: The power of graphs to analyze biological data

MichaelMichael

Davy

Peter

David Davy

Peter

temporal graph comparison

knows

knows

knows

current checkpoint

what changed?

Page 34: The power of graphs to analyze biological data

temporal graph comparison

➡ difference (A , B) = union (A , B) - B

➡ ... as a (immutable) graph!

Page 35: The power of graphs to analyze biological data

temporal graph comparison

➡ difference (A , B) = union (A , B) - B

➡ ... as a (immutable) graph!

difference ( , ) =

David

knows

Page 36: The power of graphs to analyze biological data

t3t2t1

use case: longitudinal patient data

patient patient

smoking

patient

smoking

t4

patient

cancer

t5

patient

cancer

death

Page 37: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)

Page 38: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)

➡ example analysis: ★ if a male patient is no longer smoking in 2005★ what are the chances of getting lung cancer in 2010, comparing

patients that smoked before 2005

patients that never smoked

Page 39: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Page 40: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()

Page 41: The power of graphs to analyze biological data

use case: longitudinal patient data

➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()

while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}

Page 42: The power of graphs to analyze biological data

use case: longitudinal patient data

boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; }

}).iterator().hasNext();

➡ which patients were smoking before 2005?

Page 43: The power of graphs to analyze biological data

use case: longitudinal patient data

Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());

➡ which patients have cancer in 2010

working set of smokers

Page 44: The power of graphs to analyze biological data

use case: longitudinal patient data

Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());

➡ which patients have cancer in 2010

working set of smokers

➡ extract the patients that have an edge to the cancer node

Page 45: The power of graphs to analyze biological data

Questions?


Recommended