Managing "Big Data" Application Complexity with CloudGraph

Post on 27-Jan-2015

111 views 3 download

Tags:

description

Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity

transcript

Managing “Big Data” Application Complexity using CloudGraph®

Scott Cinnamond, TerraMeta Software Inc.http://cloudgraph.org

-Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity-

Complexity Increases With Added Data Model Entities

Com

plex

ity(f

or c

olum

nar

data

sto

re c

lient

app

licat

ions

)

#Model Entities / Classes

Why More App Complexity? (with Added Data Model

Entities)

1. Column Mapping Difficult

2. Composite Row Key Mapping, Hashing, Salting and Formatting

3. Persistence Code Development, Refactoring and Maintenance

Typical Column Mapping Strategies

• Hard Coded Names Embedded in Source Code– Not good

• Column Names in Java Constants File(s)– Better, but still really hard coded– Feasible with 5-10 entities, 50 attributes– With 500-1000 entities and 5000+ attributes? Not maintainable

• Custom XML Configuration– Create a “meta model” using, say XML Schema and JAXB– Construct unique names and refer to them in source – Better but application specific ”one off”– Does not solve “state” management challenges

CloudGraph Column Mapping A Standards Based Approach Using SDO and UML

UM

L Nam

e “A

liase

s”

SDO Metadata “Repository”

Data Graph “State”

CloudGraph Statefull Column

Key Factories

Logical Nam

es

(readable)

Physical Names

(terse)

Business Nam

es

Java

byte

[] as

sess

ors

Cachin

g

Object

Poolin

g

Seq

uenc

e M

anag

emen

t

Ent

ity ID

M

appi

ng

Row

Key

M

appi

ng

Mar

shal

ling

Great, Still How Do We Keep Column Names Entirely Out Of CRUD Source

Code?Create | Update | Delete: Read (Query):

CloudGraph SDO API(Service Data Objects)

CloudGraph Query DSL(Domain Specific

Language)

CloudGraph SDO Your complex domain model as a

(create | update | delete) API• Drives all Column Mapping Transparently• Granular Control over Data Graph Edits• Convenient “Create Entity” Factory Methods• Change Tracking Including History• Rich Built In Data Types • 100% Compile Time Checking• Supports Multiple Inheritance Models• Currently Uses PlasmaSDO™

– See http://plasma-sdo.org

CloudGraph SDO API ExampleUses Chemical Modelling Language (CML) 2.4

https://github.com/cloudgraph/cml

CloudGraph Query DSLYour complex domain model as a query API

• Drives all Column Mapping Transparently• Intuitive Almost “Fluent” English Appearance• Logical Entity, Attribute Names Generated

into API• 100% Compile Time Checking• Currently Uses PlasmaQuery®

– See http://plasma-query.org

CloudGraph Query DSL ExampleUses Chemical Modelling Language (CML) 2.4

https://github.com/cloudgraph/cml

• More Model Entities:Larger data graphsMore composite row key fields so can find graphsHow to reliably map “deep” into graphs

• Row Key Field Hashing and Formatting– Critical for HBase partial-key scan API– Many data type specific idiosyncrasies

Why More Complexity? 2.) Composite Row Key Mapping,

Hashing and Formatting

CloudGraph HBase Composite Row KeysA Configuration Driven Approach using SDO XPath

C

onfigura

tion

SDO XPath

Scan Support

CloudGraph Composite Row

Keys

Hashing

Formatting

Delimiters Exp

ress

ions

Field

Map

ping

Deep

Graph

Trav

ersa

lP

artia

l Key

A

ssem

bly

Fuz

zy R

ow

Filt

er

Hie

rarc

hica

l Row

Filt

ers

Field Ordering

Why More Complexity? 3.) Persistence Code Development,

Refactoring and Maintenance

*Example from UML conversion from XML Schema of BIOXSD - see http://bioxsd.org/**Example from UML adaptation of HL7 POCD/HD000040 Clinical Document ***Example from UML conversion from XML Schema of Chemical Markup Language 2.4 – see http://xml-cml.org

Small Domain Model (e.g. CML 164 Entities) : 95,000 Lines “Average” Custom Domain Model (e.g. 300 Entities): 174,000 Lines

1. Leverage Existing or Create UML Model(s)1. Can be automatically reverse engineered from

existing RDBMS Schema

2. Map Repository Namespaces to Service Configurations

3. Define and Map Row Keys To Data Graphs4. Add CloudGraph and Plasma Maven

Artifacts and Generate Code

CloudGraph Code GenerationA contract-first approach in 4 steps

Resources

• Exchange Model Examples– https://github.com/cloudgraph/cml– https://github.com/cloudgraph/bioxsd– https://github.com/cloudgraph/hl7

• End To End Examples– https://github.com/cloudgraph/wordnet– http://wordnet.cloudgraph.org

• Project Status– CloudGraph® is currently in private beta testing– Other services for Cassandra, MongoDB and others are under

analysis– See http://cloudgraph.org for contact info and other details

• Licensing– CloudGraph® 0.5.5 Community Edition (CE) is open source

licensed under version 2 of the GNU General Public License• Trademarks

– CloudGraph® is a registered trademark of TerraMeta Software LLC– Java™ is a trademark of Oracle Corporation– HBase™ is a trademark of Apache Software Foundation

Status/Legal

Copyright © TerraMeta Software, Inc – 2012,2013 – All Rights Reserved

• BIOXSD – http://bioxsd.org• Chemical Markup Language (CML) –

http://xml-cml.org• Health Level 7 (HL7) – http://hl7.org• Apache HBase™ – http://hbase.apache.org• Apache Cassandra – http://cassandra.apache.org• MongoDB - http://www.mongodb.org• PlasmaSDO™ – http://plasma-sdo.org,

http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22plasma-sdo%22

References