Post on 19-Dec-2015
transcript
Alternative StorageRegions of Interest
Overview
What’s in a ROI? Use cases
Requirements Current Storage System
Problems Alternative Storage
What’s in an ROI?
ROI Geometry Measurements ROI on Channel Annotations ▪ ROI▪ Measurement▪ Links
Use Cases
User created ROI Measurement tools
HCS generated ROI Automatic External
External analysis Particle Tracking Other
Templates ROIs without images
Use Cases – Human Generated Human generated
More interactions▪ Merge, Propagate, Split, Delete
Measurements▪ Geometry▪ Intensity▪ Path
ROI/ROI Links Tags mostly on ROI Write Many/Read Many
Use Cases - HCS
HCS Generated ROI Lots of ROI Attached to Channel Measurements Attached▪ Multiple measurements
Tags on ROI, Measurements▪ Analysis, results and meta.
Write Once, Read Many
Use Cases – External Tools
External Tool can Generate ROI (+ scripts) Can be tagged Links (ROI/ROI, ROI/Image) Results can be in any format
Use Cases - Templates
ROI need not be attached to image Template to define other ROI
ROI from the Nth Dimension
N-Dimensional Data Storage of Image data simple ROI more complex▪ Database entry, file format
We don’t just want to store in HDF
Current Storage Solutions
Database ROI ROI Annotations
PyTables Mask ROI Measurements
Current Status
Pytables ROI are heterogeneous Concurrency Python behind a core service call Measurements are optimal Tagging is an issue▪ Inside file▪ Multiple annotations reported to be slow
Database
ROI can be stored in database Mask data can be an issue Tagging in RBD not best Many more annotations than we’d
like Link to external source for
measurements
Alternative Storage
Key-Value Pair Stores Berkeley DB Project Voldermort Tokyo Cabinet
Document DB MongoDB CouchDB
Graph DB Neo4J InfoGrid
Table DB Cassandra Hypertables HBase
Where others have gone before
Other opinions on the storage solutions MongoDB vs CouchDB, Cassandra, .. CouchDB vs MongoDB Pros and cons of MongoDB Digg on Cassandra What is a supercolumn Cassandra talk Indexing nodes in Neo4J
MongoDB
Document Database NOSQL movement Schemaless No Tables ▪ Collections of like data
No Joins▪ Document is equivalent of row of data▪ Distributed file system (GridFS)
MongoDB – Pros and ConsPros
It has bindings to numerous languages (C++, C#, Java, Python, ...). Allows storage, indexing, linking of any user data Annotations are now very easy, efficient Has mechanisms for schema upgrade Dynamic Queries Replication Sharding. Map-Reduce framework. Fast. GridFS is a distributed file storage mechanism within Mongo. Easy to install
Cons Schemaless, data integrity will need to be worked on. Graph structures not inherently supported.
MongoDB - Deployments
DEPLOYMENTS SourceForge http://sourceforge.net/ BusinessInsider
http://www.businessinsider.com/ New York Times
http://www.nytimes.com/ Disqus http://www.disqus.com/
MongoDB – ROI Use casesHuman Interaction
Merge, Propagate, Split ✓
Geometry ✓
Intensity ✓
Path ✓
ROI/ROI Links ✓
Tags ✓
HCS
Many ROI ✓
Tags on ROI ✓
Tags on Measurement ✓
Tables of Measurements ✓
Externally Generated
Tags ✓
ROI/ROI Links, ROI/Image Links
Many formats, unknown types ✓
Other
N-Dimensional ROI ✓
Hierarchical Structures ✓
MongoDB – Example insert
connection = Connection();db = connection['databaseName'];collection = db.['collectionName']; collection.insert({"tags" : [ ], "label" : “MyROI”, "shapes" : [{
"tags" : [{"tag" : "foo1", "namespace" : "bob"}],"rx" : 17,"ry" : 17,"label" : null,"cy" : 75,"cx" : 3,"t" : 0,"z" : 0,"type" : "Ellipse","id" : 3
},{
"tags" : [{"tag" : "foo2", "namespace" : "bob"}],"rx" : 10,"ry" : 16,"label" : null,"cy" : 82,"cx" : 45,"t" : 0,"z" : 0,"type" : "Ellipse","id" : 5
}], "type" : "Roi", "id" : 565 })
MongoDB – Example query
connection = Connection();db = connection['databaseName'];collection = db.['collectionName'];collection.find({"shapes.tags.tag":'/.*mitosis.*/i'})
connection = Connection();db = connection['databaseName'];collection = db.['collectionName'];collection.find({”shapes.tags.tag”:”foo1”,”tags.tag”:”foofoo”})
Find roi with tag foofoo and shapes with tag foo1
Find roi shapes with tag containing mitosis
Neo4J
Graph Database use nodes to represent objects User specifies relationship between
nodes Allows complex traversal of node
structures
Neo4J – Pros and Cons
PROS Handles graph structures nicely Transactional Supported by Gremlin Gremlin Native RDF
http://components.neo4j.org/neo-rdf-sail / Easy to install CONS No C++ language binding. Not distributed. Tables are not so easily modeled. Difficult to query on node contents
Neo4J - Deployments
DEPLOYMENTS The Swedish Defence forces
http://www.mil.se Windh Technologies
http://www.windh.com Flextoll http://www.flextoll.se
Neo4J - Examplepublic enum OMERORelations implements RelationshipType{ ASSOCIATE, DERIVE, AGGREGATE, COMPOSE}
Node image = neo.createNode();image.setProperty("IObject",imageI);image.setProperty("id",imageI.getId().getValue());image.setProperty("name",imageI.getName().getValue());
Node derivedImage = neo.createNode();derivedImage.setProperty("IObject",derivedImageI);derivedImage.setProperty("id",derivedImageI.getId().getValue());derivedImage.setProperty("name",derivedImageI.getName().getValue());
Relationship relationship = image.createRelationshipTo( derivedImage, OMERORelations.DERIVE );relationship.setProperty("type","ROI");relationship.setProperty("operation","crop");relationship.setProperty("roi",cropRoiI);
Neo4J – ROI Use casesHuman Interaction
Merge, Propagate, Split ✓
Geometry
Intensity
Path ✓
ROI/ROI Links ✓
Tags
HCS
Many ROI ✓
Tags on ROI ✓
Tags on Measurement ✓
Tables of Measurements
Externally Generated
Tags ✓
ROI/ROI Links, ROI/Image Links ✓
Many formats, unknown types
Other
N-Dimensional ROI
Hierarchical Structures ✓
Cassandra
Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table.
A sophisticated toolset is required to get the most out of this solutions, for instance Google has created sawzall to query this system. Digg have released a language to work with Cassandra called LazyBoy.
Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).
Cassandra – Pros and ConsPros Quick Handles heterogeneous data well
Different rows can have different columns Can manage distributed data
Map/Reduce Focus on writes not reads Scales nicely Easy to Install
Cons Not simple to work with
Building hierarchical structures Sorting Querying
▪ Ad Hoc Queries are bad, Digg still use MySQL for certain queries. Have to manage secondary indexes, (K/V)
Version 0.5
Cassandra - Deployments
Deployments Facebook (MAYBE!!)
http://www.facebook.com Digg http://www.digg.com
Cassandra – ROI Use cases
Human Interaction
Merge, Propagate, Split ✓
Geometry ✓
Intensity ✓
Path
ROI/ROI Links
Tags ✓
HCS
Many ROI ✓
Tags on ROI ✓
Tags on Measurement ✓
Tables of Measurements ✓
Externally Generated
Tags ✓
ROI/ROI Links, ROI/Image Links ✓
Many formats, unknown types
Other
N-Dimensional ROI ✓
Hierarchical Structures
HyperTable
Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table.
A sophisticated toolset is required to get the most out of this solutions, for instance Google has created sawzall to query this system. HyperTable has a query language call HQL.
Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).
Hypertable – Pros and ConsPros Quick Handles heterogeneous data well
Different rows can have different columns Can manage distributed data
Map/Reduce Scales nicely Easy to Install
Cons GPL License Building hierarchical structures Docs are weak HQL works for simple queries only
Map/Reduce for other work limit of 255 column families Secondary keys
HyperTable- Deployments
Deployments Rediff http://www.rediff.com Zvents http://www.zvents.com/
HyperTable – ROI Use cases
Human Interaction
Merge, Propagate, Split ✓
Geometry ✓
Intensity ✓
Path
ROI/ROI Links
Tags ✓
HCS
Many ROI ✓
Tags on ROI ✓
Tags on Measurement ✓
Tables of Measurements ✓
Externally Generated
Tags ✓
ROI/ROI Links, ROI/Image Links ✓
Many formats, unknown types
Other
N-Dimensional ROI ✓
Hierarchical Structures
Are we Normal?
Why do we have an RDMS We don’t normalise the data
Each import will normalise on:▪ Image, ObjectiveSettings, LogicalChannel,
LightSettings, Detector Settings. Object Penalty Difference between normalisation and
view