+ All Categories
Home > Technology > Managing "Big Data" Application Complexity with CloudGraph

Managing "Big Data" Application Complexity with CloudGraph

Date post: 27-Jan-2015
Category:
Upload: scott-cinnamond
View: 111 times
Download: 3 times
Share this document with a friend
Description:
Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity
Popular Tags:
17
Managing “Big Data” Application Complexity using CloudGraph ® Scott Cinnamond, TerraMeta Software Inc. http://cloudgraph.org -Analysis and solutions for problems faced by HBaseand other columnar data store client applications under the ever increasing demand for domain model complexity-
Transcript
Page 1: Managing "Big Data" Application Complexity with CloudGraph

Managing “Big Data” Application Complexity using CloudGraph®

Scott Cinnamond, TerraMeta Software Inc.http://cloudgraph.org

-Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity-

Page 2: Managing "Big Data" Application Complexity with CloudGraph

Complexity Increases With Added Data Model Entities

Com

plex

ity(f

or c

olum

nar

data

sto

re c

lient

app

licat

ions

)

#Model Entities / Classes

Page 3: Managing "Big Data" Application Complexity with CloudGraph

Why More App Complexity? (with Added Data Model

Entities)

1. Column Mapping Difficult

2. Composite Row Key Mapping, Hashing, Salting and Formatting

3. Persistence Code Development, Refactoring and Maintenance

Page 4: Managing "Big Data" Application Complexity with CloudGraph

Typical Column Mapping Strategies

• Hard Coded Names Embedded in Source Code– Not good

• Column Names in Java Constants File(s)– Better, but still really hard coded– Feasible with 5-10 entities, 50 attributes– With 500-1000 entities and 5000+ attributes? Not maintainable

• Custom XML Configuration– Create a “meta model” using, say XML Schema and JAXB– Construct unique names and refer to them in source – Better but application specific ”one off”– Does not solve “state” management challenges

Page 5: Managing "Big Data" Application Complexity with CloudGraph

CloudGraph Column Mapping A Standards Based Approach Using SDO and UML

UM

L Nam

e “A

liase

s”

SDO Metadata “Repository”

Data Graph “State”

CloudGraph Statefull Column

Key Factories

Logical Nam

es

(readable)

Physical Names

(terse)

Business Nam

es

Java

byte

[] as

sess

ors

Cachin

g

Object

Poolin

g

Seq

uenc

e M

anag

emen

t

Ent

ity ID

M

appi

ng

Row

Key

M

appi

ng

Mar

shal

ling

Page 6: Managing "Big Data" Application Complexity with CloudGraph

Great, Still How Do We Keep Column Names Entirely Out Of CRUD Source

Code?Create | Update | Delete: Read (Query):

CloudGraph SDO API(Service Data Objects)

CloudGraph Query DSL(Domain Specific

Language)

Page 7: Managing "Big Data" Application Complexity with CloudGraph

CloudGraph SDO Your complex domain model as a

(create | update | delete) API• Drives all Column Mapping Transparently• Granular Control over Data Graph Edits• Convenient “Create Entity” Factory Methods• Change Tracking Including History• Rich Built In Data Types • 100% Compile Time Checking• Supports Multiple Inheritance Models• Currently Uses PlasmaSDO™

– See http://plasma-sdo.org

Page 8: Managing "Big Data" Application Complexity with CloudGraph

CloudGraph SDO API ExampleUses Chemical Modelling Language (CML) 2.4

https://github.com/cloudgraph/cml

Page 9: Managing "Big Data" Application Complexity with CloudGraph

CloudGraph Query DSLYour complex domain model as a query API

• Drives all Column Mapping Transparently• Intuitive Almost “Fluent” English Appearance• Logical Entity, Attribute Names Generated

into API• 100% Compile Time Checking• Currently Uses PlasmaQuery®

– See http://plasma-query.org

Page 10: Managing "Big Data" Application Complexity with CloudGraph

CloudGraph Query DSL ExampleUses Chemical Modelling Language (CML) 2.4

https://github.com/cloudgraph/cml

Page 11: Managing "Big Data" Application Complexity with CloudGraph

• More Model Entities:Larger data graphsMore composite row key fields so can find graphsHow to reliably map “deep” into graphs

• Row Key Field Hashing and Formatting– Critical for HBase partial-key scan API– Many data type specific idiosyncrasies

Why More Complexity? 2.) Composite Row Key Mapping,

Hashing and Formatting

Page 12: Managing "Big Data" Application Complexity with CloudGraph

CloudGraph HBase Composite Row KeysA Configuration Driven Approach using SDO XPath

C

onfigura

tion

SDO XPath

Scan Support

CloudGraph Composite Row

Keys

Hashing

Formatting

Delimiters Exp

ress

ions

Field

Map

ping

Deep

Graph

Trav

ersa

lP

artia

l Key

A

ssem

bly

Fuz

zy R

ow

Filt

er

Hie

rarc

hica

l Row

Filt

ers

Field Ordering

Page 13: Managing "Big Data" Application Complexity with CloudGraph

Why More Complexity? 3.) Persistence Code Development,

Refactoring and Maintenance

*Example from UML conversion from XML Schema of BIOXSD - see http://bioxsd.org/**Example from UML adaptation of HL7 POCD/HD000040 Clinical Document ***Example from UML conversion from XML Schema of Chemical Markup Language 2.4 – see http://xml-cml.org

Small Domain Model (e.g. CML 164 Entities) : 95,000 Lines “Average” Custom Domain Model (e.g. 300 Entities): 174,000 Lines

Page 14: Managing "Big Data" Application Complexity with CloudGraph

1. Leverage Existing or Create UML Model(s)1. Can be automatically reverse engineered from

existing RDBMS Schema

2. Map Repository Namespaces to Service Configurations

3. Define and Map Row Keys To Data Graphs4. Add CloudGraph and Plasma Maven

Artifacts and Generate Code

CloudGraph Code GenerationA contract-first approach in 4 steps

Page 15: Managing "Big Data" Application Complexity with CloudGraph

Resources

• Exchange Model Examples– https://github.com/cloudgraph/cml– https://github.com/cloudgraph/bioxsd– https://github.com/cloudgraph/hl7

• End To End Examples– https://github.com/cloudgraph/wordnet– http://wordnet.cloudgraph.org

Page 16: Managing "Big Data" Application Complexity with CloudGraph

• Project Status– CloudGraph® is currently in private beta testing– Other services for Cassandra, MongoDB and others are under

analysis– See http://cloudgraph.org for contact info and other details

• Licensing– CloudGraph® 0.5.5 Community Edition (CE) is open source

licensed under version 2 of the GNU General Public License• Trademarks

– CloudGraph® is a registered trademark of TerraMeta Software LLC– Java™ is a trademark of Oracle Corporation– HBase™ is a trademark of Apache Software Foundation

Status/Legal

Copyright © TerraMeta Software, Inc – 2012,2013 – All Rights Reserved

Page 17: Managing "Big Data" Application Complexity with CloudGraph

• BIOXSD – http://bioxsd.org• Chemical Markup Language (CML) –

http://xml-cml.org• Health Level 7 (HL7) – http://hl7.org• Apache HBase™ – http://hbase.apache.org• Apache Cassandra – http://cassandra.apache.org• MongoDB - http://www.mongodb.org• PlasmaSDO™ – http://plasma-sdo.org,

http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22plasma-sdo%22

References


Recommended