by Michael Bowers 2016-04-07
v. 1.2
How to Make the Case for MarkLogic Internally
1
2016 MarkLogic World
Abstract
The Church of Jesus Christ of Latter-day Saints has been doing NoSQL for over 8 years. This presentation shares some of the lessons learned. • How we got NoSQL approved in spite of passionate opposition from the
relational database team
• How we got widespread adoption of NoSQL in spite of widespread opposition from developers and management
• How NoSQL fell into the trough of disillusionment and we pulled it out
• How we expanded NoSQL usage by embracing both XML content and JSON data
• How we have to constantly train to prevent backsliding to relational habits
2
About the Author
Michael Bowers • Principal Architect
LDS Church
• Author – Pro CSS and HTML Design Patterns
• Published by Apress, 2007
– Pro HTML5 and CSS3 Design Patterns • Published by Apress, 2011
3
Church of Jesus Christ of Latter-day Saints
• 15 million members (29,621 congregations worldwide)
• Humanitarian assistance in 185 countries
• Thousands of documents in 188 published languages
• 192 websites and applications in production with billions of page views annually running on hundreds of MarkLogic servers
Lesson #1: NoSQL Champion
–Influential across the organization
–Convince developers
–Convince upper management
–Tough skinned
Every organization needs a NoSQL champion
Lesson #2: Must Get Management Buy-in
• Enterprises – Upper management likes
Enterprise Databases
• Startups – Upper management
likes Open Source Databases
• Lower management – Lower management
wants to make engineers happy
Management can force a company to use a particular database, but developers may revolt
Lesson #3: Must Get Developer Buy-in
• Document has fastest development
• Key/Value has fastest performance
• Wide Column has Internet scale
Show developers how NoSQL can make them rock stars
Lesson #4: Train, Train, Train
• Data modeling – See data the way
NoSQL sees it
• Querying – Cookbooks – Best practices – Anti-patterns
Developers fail without lots of NoSQL training
How to Influence • Four behavior styles • Each requires a different type of influence • Each person has a dominant style mixed with other styles
Style Key Characteristics How to influence
Analyzer • Thoughtful • Soft and slow to speak • Must be right
• Show them hard facts and data • Help them to be the most informed • Support their standards and principles
Controller • Decisive • Hard spoken & disagreeable • Must be in control
• Show them success through action • Hold debates to prove the best solution • Support their objectives and results
Persuader • Convincing • Loud spoken & fun loving • Must be liked
• Show them influential opinions • Praise and like them • Support their fun-loving & risk taking
Stabilizer • Pleasing • Slowly builds consensus • Must be safe through consensus
• Show them the popular opinion • Listen and understand them • Support them as a person
EffectivenessInstitute.com
How to Influence "Analyzers"
• Train "analyzers" to be NoSQL experts
• Use facts: analyzers must be right
• Ensure facts align to dearest principles – NoSQL databases scale horizontally
– NoSQL is typically a basic engine • Use NoSQL to build a database that is customized for one
app for velocity, clustering, durability, and consistency
– NoSQL is typically ideal for Internet startups • Open source, trendy, customizable, cloud-optimized
– Few NoSQL engines are ideal for enterprises • True ACID compliance, Joins • Multi-model (document, graph, relational, dimensional) • Mature enterprise features • Usable out of the box • Exceptional developer productivity
How to Influence "Controllers" • Show executives success through action
– Use NoSQL to build a real solution quickly and cheaply
• Have executives hold debates to find the best solution – Controllers must always be in control
– Hold NoSQL bakeoffs
• Support executive objectives – Reduces database license costs – Reduces development costs – Lowers infrastructure costs
by dynamic sizing in the cloud – Scales globally – Provides global availability – Enables data sovereignty – Solves previously unsolvable problems
How to Influence "Persuaders" • Showcase influential opinions
– Invite industry renown NoSQL experts to present to the organization
– Point out major companies using NoSQL
• Support their need for risk – NoSQL is risky —revolutionary
– NoSQL is game-changing
• Support their need for fun – NoSQL is new and exciting for developers
– Developers love managers who let them play
• Praise and show you like them – Support "persuaders" by praising their decision to make developers happy
How to Influence "Stabilizers"
• Show the popular opinion
– Use DB-Engines Ranking: http://db-engines.com
• Show them MongoDB is the fourth most popular database (after Oracle, MySQL, and SQL Server)
• Show them Cassandra is the eighth most popular database
• Show them Redis is the ninth most popular database
– Choose the most popular NoSQL database that meets your needs
– Get a consensus on a specific NoSQL database
• Listen, understand, and support them as a person – Take time to fully understand their individual concerns about NoSQL
– Appreciate the great personal risk they are taking to support NoSQL
Will your NoSQL DB be a victim of the Hype Cycle?
15
NoSQL
MapReduce
Technology Trigger
Inflated Expectations
Disillusionment Enlightenment Productivity
SQL
Derived from Gartner Hype Cycle for Data Management
Enterprise Ready 1 to 5 years 5 to 10 years
DB Appliances
Is Velocity your Killer Argument for NoSQL?
16
Volume Per Day
Real-world 1K Transactions
Per Day
Real-world 1K Transactions Per Second
Relational DB
Document DB
Key Value or Wide Column
8 GB 8,640,000 100 As Is
86 GB 86,400,000 1,000 Tuned* As Is
432 GB 432,000,000 5,000 Appliance Tuned* As Is
864 GB 864,000,000 10,000 Clustered Appliance
Clustered Servers
Tuned*
8,640 GB 8,640,000,000 100,000 Many Clustered Servers
Clustered Servers
43,200 GB 43,200,000,000 500,000 Many Clustered Servers
* Tuned means tuning the model, queries, and/or hardware (more CPU, RAM, and Flash)
Scale Clusters
17
One CPU Core
Multiple Cores
Multiple CPUs
Servers
Availability Zones
Global Data Centers
Data is automatically
spread across all servers to scale
the storage, processing, and querying of big
data
Datacenter
• Data can be dispersed "randomly" across all servers for maximum parallel query performance
• Data can be sharded onto specific servers for data colocation, predictable data processing, predictable lookups, etc.
• Common data can be replicated across all servers for quick local access
Consistent Real-time
Few Data Copies Less Compute Vertical Scale
Less Availability
Inconsistent Near-time
Many Data Copies More Compute
Horizontal Scale Global Availability
Is Horizontal Scale your Killer Argument for NoSQL?
Globally Consistent Clusters
18
One CPU Core
Multiple Cores
Multiple CPUs
Servers
Availability Zones
Global Data Centers
Sharded data is read and written locally.
Shared "master" data is read everywhere, but written only to one data center.
Pros • Global availability • Real-time reads & writes • Simple development:
Consistent data and simple multi-phase commit of shared master data Cons • All writes of shared data go to one cluster — this slows writes for distant locations • Hard to create a global, federated view of sharded data (but not hard for MarkLogic) • When a data center fails, any data committed in it but not yet transmitted to other data
centers is lost until the failed data center comes back online
Zone 1
Zone 2
Datacenter 2
Zone 1
Zone 2
Datacenter 1
async
sync
sync
Consistent Real-time
Few Data Copies Less Compute Vertical Scale
Less Availability
Inconsistent Near-time
Many Data Copies More Compute
Horizontal Scale Global Availability
Is Multimaster your Killer Argument for NoSQL?
Is Data Model your Killer Argument for NoSQL?
19
A Relational Model of Data for Large Shared Data Banks E. F. CODD IBM Research Laboratory, San Jose, California Information Retrieval, Volume 13 / Number 6 / June, 1970 Programs should remain unaffected when the internal representation of data is changed. …Tree-structured…inadequacies…are discussed. …Relations…are discussed and applied to the problems of redundancy and consistency…. KEY WORDS AND PHRASES: data base, data structure, data organization, hierarchies of data, networks of data, relations CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22
+ Data Narrative + Relationships = Contextual Information
1. Relational Model and Normal Form 1.1. INTRODUCTION This paper is concerned with the application of elementary relation theory…to…formatted data. …The problems…are those of data independence…and…data inconsistency…. The relational view…appears to be superior in several respects to the graph or network model…. …Relational view…forms a sound basis for treating derivability, redundancy, and consistency…. [and] a clearer evaluation…of
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS …Tables…represent a major advance toward the goal of data independence…
1.2.1. Ordering Dependence. …Programs which take advantage of the stored ordering of a file are likely to fail…if…it becomes necessary to replace that ordering by a different one.
1.2.2. Indexing Dependence. …Can application programs…remain invariant as indices come and go? …
1.2.3. Access Path Dependence. Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data. …These programs fail when a change in structure becomes necessary. The…program…is required to exploit…paths to the data. …Programs become dependent on the continued existence of the…paths.
T
P O L L
E A A A
I
I T
T T
T
T
T T
T T
T
R R R R R
(Semantic & Structural) = Meaningful Knowledge
Is JSON or XML your Killer Argument for NoSQL?
<section> <heading> Data Models</heading>
<paragraph> This paper shows…. </paragraph>
<paragraph> The
<i>relational</i>
model is no longer, <br/>
the only game in town. </paragraph></section>
{"section": { "heading": "Data Models", "paragraphs":[ {"paragraph": [ { "s": "This paper shows…." } ]}, {"paragraph": [
"The ", {"i": "relational"}, "model is no longer,", {"br": null}, "the only game in town." ]}]}}
JSON 1. Best for structured data (text poured into objects) 2. Best for computer languages 3. No document type and immature schemas 4. Objects, arrays, floats, strings, booleans, nulls 5. No namespaces, No comments, No attributes 6. Easy, simple, compact, and fast to parse
XML 1. Best for structured text (structure added on top of text) 2. Best for natural languages 3. Document types with optional mature schemas 4. Objects, sets, all data types: dates, durations, integers, etc. 5. Namespaces, Comments, Attributes 6. Attributes add metadata; Namespaces embed object types
20
Wide-column/Key-value
More structure (schema) Less structure (schemaless)
Graph
#20 Neo4j #32 MarkLogic #41 OrientDB #44 Titan
Surgeonperformed
on
works at
at
at
friend of
Operation
Person
Hospital
likes
has poor success at fails at
Graph/RDF
Dimensional
#1 Oracle Exadata #13 Teradata #16 Hive #28 Netezza #29 Vertica #33 Greenplum #36 Amazon Redshift
Data Warehouse Hospital KeyHospital Attributes...
Hospital Dimension
Surgeon KeySurgeon Attributes...
Surgeon DimensionOperation KeyOperation Attributes...
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes...
Drug Dimension
#1 Oracle Exalytics #19 SAP HANA
Live Analytics Hospital KeyHospital Attributes...
Hospital Dimension
Surgeon KeySurgeon Attributes...
Surgeon DimensionOperation KeyOperation Attributes...
Operation Dimension
Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose
Drug Dose Facts Drug KeyDrug Attributes...
Drug Dimension
Relational
#58 GemFire #69 Oracle x10
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
newSQL
#1 Oracle DB #2 MySQL #3 SQL Server #5 PostgreSQL #6 DB2 #10 SQLite #12 SAP AS #19 SAP HANA #21 Informix #22 MariaDB
Surgeon
Surgeon Number
Surgeon First NameSurgeon Last Name
Operation Codes
Operation Code
Operation Name
Operation
Hospital NumberOperation Number
Surgeon NumberOperation Code
Hospital
Hospital Number
Hospital Nameperformed at
schedulesperformed by
performs
classified byclassifies
SQL
Document
#11 ElasticSearch #14 Solr #35 MarkLogic #37 Sphinx
XML Hospital Name: John Hopkins Operation Number: 13 Operation Type: Heart Transplant Surgeon Name: Dorothy Oz Drug Name
Drug Manufacturer
Dose Size
Dose UOM
Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg
Doc Warehouse
JSON
Document
#4 MongoDB #24 Couchbase #25 CouchDB #32 MarkLogic #41 OrientDB #48 Cloudant
Hadoop #18 Splunk
Raw
Big Data
Raw
#9 Redis #23 Memcached #26 DynamoDB #31 Riak
Key/Value Simple
Key
#8 Cassandra #15 Hbase
Wide-Column Complex
Key
Low
Lat
ency
Ope
ratio
nal
Velo
city
Hi
gh B
andw
idth
Ana
lytic
al V
olum
e
Hour
s m
inut
es
sec
onds
m
illise
cond
s
mic
rose
cond
s PB
s
TBs
GBs
0
.1Kt
0.5
Kt
1Kt
1
0Kt
100K
t Which databases best deliver your killer NoSQL arguments?
by Michael Bowers 2016-04-07
v. 1.2
How to Make the Case for MarkLogic Internally
22
2016 MarkLogic World