Regions of Interest. What’s in a ROI? Use cases Requirements Current Storage System Problems ...

Post on 19-Dec-2015

213 views 0 download

Tags:

transcript

Alternative StorageRegions of Interest

Overview

What’s in a ROI? Use cases

Requirements Current Storage System

Problems Alternative Storage

What’s in an ROI?

ROI Geometry Measurements ROI on Channel Annotations ▪ ROI▪ Measurement▪ Links

Use Cases

User created ROI Measurement tools

HCS generated ROI Automatic External

External analysis Particle Tracking Other

Templates ROIs without images

Use Cases – Human Generated Human generated

More interactions▪ Merge, Propagate, Split, Delete

Measurements▪ Geometry▪ Intensity▪ Path

ROI/ROI Links Tags mostly on ROI Write Many/Read Many

Use Cases - HCS

HCS Generated ROI Lots of ROI Attached to Channel Measurements Attached▪ Multiple measurements

Tags on ROI, Measurements▪ Analysis, results and meta.

Write Once, Read Many

Use Cases – External Tools

External Tool can Generate ROI (+ scripts) Can be tagged Links (ROI/ROI, ROI/Image) Results can be in any format

Use Cases - Templates

ROI need not be attached to image Template to define other ROI

ROI from the Nth Dimension

N-Dimensional Data Storage of Image data simple ROI more complex▪ Database entry, file format

We don’t just want to store in HDF

Current Storage Solutions

Database ROI ROI Annotations

PyTables Mask ROI Measurements

Current Status

Pytables ROI are heterogeneous Concurrency Python behind a core service call Measurements are optimal Tagging is an issue▪ Inside file▪ Multiple annotations reported to be slow

Database

ROI can be stored in database Mask data can be an issue Tagging in RBD not best Many more annotations than we’d

like Link to external source for

measurements

Alternative Storage

Key-Value Pair Stores Berkeley DB Project Voldermort Tokyo Cabinet

Document DB MongoDB CouchDB

Graph DB Neo4J InfoGrid

Table DB Cassandra Hypertables HBase

MongoDB

Document Database NOSQL movement Schemaless No Tables ▪ Collections of like data

No Joins▪ Document is equivalent of row of data▪ Distributed file system (GridFS)

MongoDB – Pros and ConsPros

It has bindings to numerous languages (C++, C#, Java, Python, ...). Allows storage, indexing, linking of any user data Annotations are now very easy, efficient Has mechanisms for schema upgrade Dynamic Queries Replication Sharding. Map-Reduce framework. Fast. GridFS is a distributed file storage mechanism within Mongo. Easy to install

Cons Schemaless, data integrity will need to be worked on. Graph structures not inherently supported.

MongoDB - Deployments

DEPLOYMENTS SourceForge  http://sourceforge.net/ BusinessInsider

 http://www.businessinsider.com/ New York Times

 http://www.nytimes.com/ Disqus  http://www.disqus.com/

MongoDB – ROI Use casesHuman Interaction

Merge, Propagate, Split ✓

Geometry ✓

Intensity ✓

Path ✓

ROI/ROI Links ✓

Tags ✓

HCS

Many ROI ✓

Tags on ROI ✓

Tags on Measurement ✓

Tables of Measurements ✓

Externally Generated

Tags ✓

ROI/ROI Links, ROI/Image Links

Many formats, unknown types ✓

Other

N-Dimensional ROI ✓

Hierarchical Structures ✓

MongoDB – Example insert

connection = Connection();db = connection['databaseName'];collection = db.['collectionName']; collection.insert({"tags" : [ ], "label" : “MyROI”, "shapes" : [{

"tags" : [{"tag" : "foo1", "namespace" : "bob"}],"rx" : 17,"ry" : 17,"label" : null,"cy" : 75,"cx" : 3,"t" : 0,"z" : 0,"type" : "Ellipse","id" : 3

},{

"tags" : [{"tag" : "foo2", "namespace" : "bob"}],"rx" : 10,"ry" : 16,"label" : null,"cy" : 82,"cx" : 45,"t" : 0,"z" : 0,"type" : "Ellipse","id" : 5

}], "type" : "Roi", "id" : 565 })

MongoDB – Example query

connection = Connection();db = connection['databaseName'];collection = db.['collectionName'];collection.find({"shapes.tags.tag":'/.*mitosis.*/i'})

connection = Connection();db = connection['databaseName'];collection = db.['collectionName'];collection.find({”shapes.tags.tag”:”foo1”,”tags.tag”:”foofoo”})

Find roi with tag foofoo and shapes with tag foo1

Find roi shapes with tag containing mitosis

Neo4J

Graph Database use nodes to represent objects User specifies relationship between

nodes Allows complex traversal of node

structures

Neo4J – Pros and Cons

PROS Handles graph structures nicely Transactional Supported by Gremlin  Gremlin Native RDF

 http://components.neo4j.org/neo-rdf-sail / Easy to install CONS No C++ language binding. Not distributed. Tables are not so easily modeled. Difficult to query on node contents

Neo4J - Deployments

DEPLOYMENTS The Swedish Defence forces

 http://www.mil.se Windh Technologies

 http://www.windh.com Flextoll  http://www.flextoll.se

Neo4J - Examplepublic enum OMERORelations implements RelationshipType{ ASSOCIATE, DERIVE, AGGREGATE, COMPOSE}

Node image = neo.createNode();image.setProperty("IObject",imageI);image.setProperty("id",imageI.getId().getValue());image.setProperty("name",imageI.getName().getValue());

Node derivedImage = neo.createNode();derivedImage.setProperty("IObject",derivedImageI);derivedImage.setProperty("id",derivedImageI.getId().getValue());derivedImage.setProperty("name",derivedImageI.getName().getValue());

Relationship relationship = image.createRelationshipTo( derivedImage, OMERORelations.DERIVE );relationship.setProperty("type","ROI");relationship.setProperty("operation","crop");relationship.setProperty("roi",cropRoiI);

Neo4J – ROI Use casesHuman Interaction

Merge, Propagate, Split ✓

Geometry

Intensity

Path ✓

ROI/ROI Links ✓

Tags

HCS

Many ROI ✓

Tags on ROI ✓

Tags on Measurement ✓

Tables of Measurements

Externally Generated

Tags ✓

ROI/ROI Links, ROI/Image Links ✓

Many formats, unknown types

Other

N-Dimensional ROI

Hierarchical Structures ✓

Cassandra

Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table.

A sophisticated toolset is required to get the most out of this solutions, for instance Google has created  sawzall to query this system. Digg have released a language to work with Cassandra called  LazyBoy.

Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).

Cassandra – Pros and ConsPros Quick Handles heterogeneous data well

Different rows can have different columns Can manage distributed data

Map/Reduce Focus on writes not reads Scales nicely Easy to Install

Cons Not simple to work with

Building hierarchical structures Sorting Querying

▪ Ad Hoc Queries are bad, Digg still use MySQL for certain queries. Have to manage secondary indexes, (K/V)

Version 0.5

Cassandra - Deployments

Deployments Facebook (MAYBE!!)

http://www.facebook.com Digg http://www.digg.com

Cassandra – ROI Use cases

Human Interaction

Merge, Propagate, Split ✓

Geometry ✓

Intensity ✓

Path

ROI/ROI Links

Tags ✓

HCS

Many ROI ✓

Tags on ROI ✓

Tags on Measurement ✓

Tables of Measurements ✓

Externally Generated

Tags ✓

ROI/ROI Links, ROI/Image Links ✓

Many formats, unknown types

Other

N-Dimensional ROI ✓

Hierarchical Structures

HyperTable

Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table.

A sophisticated toolset is required to get the most out of this solutions, for instance Google has created  sawzall to query this system. HyperTable has a query language call HQL.

Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).

Hypertable – Pros and ConsPros Quick Handles heterogeneous data well

Different rows can have different columns Can manage distributed data

Map/Reduce Scales nicely Easy to Install

Cons GPL License Building hierarchical structures Docs are weak HQL works for simple queries only

Map/Reduce for other work limit of 255 column families Secondary keys

HyperTable- Deployments

Deployments Rediff http://www.rediff.com Zvents http://www.zvents.com/

HyperTable – ROI Use cases

Human Interaction

Merge, Propagate, Split ✓

Geometry ✓

Intensity ✓

Path

ROI/ROI Links

Tags ✓

HCS

Many ROI ✓

Tags on ROI ✓

Tags on Measurement ✓

Tables of Measurements ✓

Externally Generated

Tags ✓

ROI/ROI Links, ROI/Image Links ✓

Many formats, unknown types

Other

N-Dimensional ROI ✓

Hierarchical Structures

Are we Normal?

Why do we have an RDMS We don’t normalise the data

Each import will normalise on:▪ Image, ObjectiveSettings, LogicalChannel,

LightSettings, Detector Settings. Object Penalty Difference between normalisation and

view