Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | scott-cinnamond |
View: | 111 times |
Download: | 3 times |
Managing “Big Data” Application Complexity using CloudGraph®
Scott Cinnamond, TerraMeta Software Inc.http://cloudgraph.org
-Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity-
Complexity Increases With Added Data Model Entities
Com
plex
ity(f
or c
olum
nar
data
sto
re c
lient
app
licat
ions
)
#Model Entities / Classes
Why More App Complexity? (with Added Data Model
Entities)
1. Column Mapping Difficult
2. Composite Row Key Mapping, Hashing, Salting and Formatting
3. Persistence Code Development, Refactoring and Maintenance
Typical Column Mapping Strategies
• Hard Coded Names Embedded in Source Code– Not good
• Column Names in Java Constants File(s)– Better, but still really hard coded– Feasible with 5-10 entities, 50 attributes– With 500-1000 entities and 5000+ attributes? Not maintainable
• Custom XML Configuration– Create a “meta model” using, say XML Schema and JAXB– Construct unique names and refer to them in source – Better but application specific ”one off”– Does not solve “state” management challenges
CloudGraph Column Mapping A Standards Based Approach Using SDO and UML
UM
L Nam
e “A
liase
s”
SDO Metadata “Repository”
Data Graph “State”
CloudGraph Statefull Column
Key Factories
Logical Nam
es
(readable)
Physical Names
(terse)
Business Nam
es
Java
byte
[] as
sess
ors
Cachin
g
Object
Poolin
g
Seq
uenc
e M
anag
emen
t
Ent
ity ID
M
appi
ng
Row
Key
M
appi
ng
Mar
shal
ling
Great, Still How Do We Keep Column Names Entirely Out Of CRUD Source
Code?Create | Update | Delete: Read (Query):
CloudGraph SDO API(Service Data Objects)
CloudGraph Query DSL(Domain Specific
Language)
CloudGraph SDO Your complex domain model as a
(create | update | delete) API• Drives all Column Mapping Transparently• Granular Control over Data Graph Edits• Convenient “Create Entity” Factory Methods• Change Tracking Including History• Rich Built In Data Types • 100% Compile Time Checking• Supports Multiple Inheritance Models• Currently Uses PlasmaSDO™
– See http://plasma-sdo.org
CloudGraph SDO API ExampleUses Chemical Modelling Language (CML) 2.4
https://github.com/cloudgraph/cml
CloudGraph Query DSLYour complex domain model as a query API
• Drives all Column Mapping Transparently• Intuitive Almost “Fluent” English Appearance• Logical Entity, Attribute Names Generated
into API• 100% Compile Time Checking• Currently Uses PlasmaQuery®
– See http://plasma-query.org
CloudGraph Query DSL ExampleUses Chemical Modelling Language (CML) 2.4
https://github.com/cloudgraph/cml
• More Model Entities:Larger data graphsMore composite row key fields so can find graphsHow to reliably map “deep” into graphs
• Row Key Field Hashing and Formatting– Critical for HBase partial-key scan API– Many data type specific idiosyncrasies
Why More Complexity? 2.) Composite Row Key Mapping,
Hashing and Formatting
CloudGraph HBase Composite Row KeysA Configuration Driven Approach using SDO XPath
C
onfigura
tion
SDO XPath
Scan Support
CloudGraph Composite Row
Keys
Hashing
Formatting
Delimiters Exp
ress
ions
Field
Map
ping
Deep
Graph
Trav
ersa
lP
artia
l Key
A
ssem
bly
Fuz
zy R
ow
Filt
er
Hie
rarc
hica
l Row
Filt
ers
Field Ordering
Why More Complexity? 3.) Persistence Code Development,
Refactoring and Maintenance
*Example from UML conversion from XML Schema of BIOXSD - see http://bioxsd.org/**Example from UML adaptation of HL7 POCD/HD000040 Clinical Document ***Example from UML conversion from XML Schema of Chemical Markup Language 2.4 – see http://xml-cml.org
Small Domain Model (e.g. CML 164 Entities) : 95,000 Lines “Average” Custom Domain Model (e.g. 300 Entities): 174,000 Lines
1. Leverage Existing or Create UML Model(s)1. Can be automatically reverse engineered from
existing RDBMS Schema
2. Map Repository Namespaces to Service Configurations
3. Define and Map Row Keys To Data Graphs4. Add CloudGraph and Plasma Maven
Artifacts and Generate Code
CloudGraph Code GenerationA contract-first approach in 4 steps
Resources
• Exchange Model Examples– https://github.com/cloudgraph/cml– https://github.com/cloudgraph/bioxsd– https://github.com/cloudgraph/hl7
• End To End Examples– https://github.com/cloudgraph/wordnet– http://wordnet.cloudgraph.org
• Project Status– CloudGraph® is currently in private beta testing– Other services for Cassandra, MongoDB and others are under
analysis– See http://cloudgraph.org for contact info and other details
• Licensing– CloudGraph® 0.5.5 Community Edition (CE) is open source
licensed under version 2 of the GNU General Public License• Trademarks
– CloudGraph® is a registered trademark of TerraMeta Software LLC– Java™ is a trademark of Oracle Corporation– HBase™ is a trademark of Apache Software Foundation
Status/Legal
Copyright © TerraMeta Software, Inc – 2012,2013 – All Rights Reserved
• BIOXSD – http://bioxsd.org• Chemical Markup Language (CML) –
http://xml-cml.org• Health Level 7 (HL7) – http://hl7.org• Apache HBase™ – http://hbase.apache.org• Apache Cassandra – http://cassandra.apache.org• MongoDB - http://www.mongodb.org• PlasmaSDO™ – http://plasma-sdo.org,
http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22plasma-sdo%22
References