+ All Categories
Home > Documents > How to Make the Case for MarkLogic Internally · How to Make the Case for MarkLogic Internally 1...

How to Make the Case for MarkLogic Internally · How to Make the Case for MarkLogic Internally 1...

Date post: 05-Feb-2020
Category:
Upload: others
View: 14 times
Download: 0 times
Share this document with a friend
22
by Michael Bowers 2016-04-07 v. 1.2 How to Make the Case for MarkLogic Internally 1 2016 MarkLogic World [email protected]
Transcript

by Michael Bowers 2016-04-07

v. 1.2

How to Make the Case for MarkLogic Internally

1

2016 MarkLogic World

[email protected]

Abstract

The Church of Jesus Christ of Latter-day Saints has been doing NoSQL for over 8 years. This presentation shares some of the lessons learned. • How we got NoSQL approved in spite of passionate opposition from the

relational database team

• How we got widespread adoption of NoSQL in spite of widespread opposition from developers and management

• How NoSQL fell into the trough of disillusionment and we pulled it out

• How we expanded NoSQL usage by embracing both XML content and JSON data

• How we have to constantly train to prevent backsliding to relational habits

2

About the Author

Michael Bowers • Principal Architect

LDS Church

• Author – Pro CSS and HTML Design Patterns

• Published by Apress, 2007

– Pro HTML5 and CSS3 Design Patterns • Published by Apress, 2011

[email protected]

3

Church of Jesus Christ of Latter-day Saints

• 15 million members (29,621 congregations worldwide)

• Humanitarian assistance in 185 countries

• Thousands of documents in 188 published languages

• 192 websites and applications in production with billions of page views annually running on hundreds of MarkLogic servers

hacky

How I got started with NoSQL

Lesson #1: NoSQL Champion

–Influential across the organization

–Convince developers

–Convince upper management

–Tough skinned

Every organization needs a NoSQL champion

Lesson #2: Must Get Management Buy-in

• Enterprises – Upper management likes

Enterprise Databases

• Startups – Upper management

likes Open Source Databases

• Lower management – Lower management

wants to make engineers happy

Management can force a company to use a particular database, but developers may revolt

Lesson #3: Must Get Developer Buy-in

• Document has fastest development

• Key/Value has fastest performance

• Wide Column has Internet scale

Show developers how NoSQL can make them rock stars

Lesson #4: Train, Train, Train

• Data modeling – See data the way

NoSQL sees it

• Querying – Cookbooks – Best practices – Anti-patterns

Developers fail without lots of NoSQL training

How to Influence • Four behavior styles • Each requires a different type of influence • Each person has a dominant style mixed with other styles

Style Key Characteristics How to influence

Analyzer • Thoughtful • Soft and slow to speak • Must be right

• Show them hard facts and data • Help them to be the most informed • Support their standards and principles

Controller • Decisive • Hard spoken & disagreeable • Must be in control

• Show them success through action • Hold debates to prove the best solution • Support their objectives and results

Persuader • Convincing • Loud spoken & fun loving • Must be liked

• Show them influential opinions • Praise and like them • Support their fun-loving & risk taking

Stabilizer • Pleasing • Slowly builds consensus • Must be safe through consensus

• Show them the popular opinion • Listen and understand them • Support them as a person

EffectivenessInstitute.com

How to Influence "Analyzers"

• Train "analyzers" to be NoSQL experts

• Use facts: analyzers must be right

• Ensure facts align to dearest principles – NoSQL databases scale horizontally

– NoSQL is typically a basic engine • Use NoSQL to build a database that is customized for one

app for velocity, clustering, durability, and consistency

– NoSQL is typically ideal for Internet startups • Open source, trendy, customizable, cloud-optimized

– Few NoSQL engines are ideal for enterprises • True ACID compliance, Joins • Multi-model (document, graph, relational, dimensional) • Mature enterprise features • Usable out of the box • Exceptional developer productivity

How to Influence "Controllers" • Show executives success through action

– Use NoSQL to build a real solution quickly and cheaply

• Have executives hold debates to find the best solution – Controllers must always be in control

– Hold NoSQL bakeoffs

• Support executive objectives – Reduces database license costs – Reduces development costs – Lowers infrastructure costs

by dynamic sizing in the cloud – Scales globally – Provides global availability – Enables data sovereignty – Solves previously unsolvable problems

How to Influence "Persuaders" • Showcase influential opinions

– Invite industry renown NoSQL experts to present to the organization

– Point out major companies using NoSQL

• Support their need for risk – NoSQL is risky —revolutionary

– NoSQL is game-changing

• Support their need for fun – NoSQL is new and exciting for developers

– Developers love managers who let them play

• Praise and show you like them – Support "persuaders" by praising their decision to make developers happy

How to Influence "Stabilizers"

• Show the popular opinion

– Use DB-Engines Ranking: http://db-engines.com

• Show them MongoDB is the fourth most popular database (after Oracle, MySQL, and SQL Server)

• Show them Cassandra is the eighth most popular database

• Show them Redis is the ninth most popular database

– Choose the most popular NoSQL database that meets your needs

– Get a consensus on a specific NoSQL database

• Listen, understand, and support them as a person – Take time to fully understand their individual concerns about NoSQL

– Appreciate the great personal risk they are taking to support NoSQL

Will your NoSQL DB be a victim of the Hype Cycle?

15

NoSQL

MapReduce

Technology Trigger

Inflated Expectations

Disillusionment Enlightenment Productivity

SQL

Derived from Gartner Hype Cycle for Data Management

Enterprise Ready 1 to 5 years 5 to 10 years

DB Appliances

Is Velocity your Killer Argument for NoSQL?

16

Volume Per Day

Real-world 1K Transactions

Per Day

Real-world 1K Transactions Per Second

Relational DB

Document DB

Key Value or Wide Column

8 GB 8,640,000 100 As Is

86 GB 86,400,000 1,000 Tuned* As Is

432 GB 432,000,000 5,000 Appliance Tuned* As Is

864 GB 864,000,000 10,000 Clustered Appliance

Clustered Servers

Tuned*

8,640 GB 8,640,000,000 100,000 Many Clustered Servers

Clustered Servers

43,200 GB 43,200,000,000 500,000 Many Clustered Servers

* Tuned means tuning the model, queries, and/or hardware (more CPU, RAM, and Flash)

Scale Clusters

17

One CPU Core

Multiple Cores

Multiple CPUs

Servers

Availability Zones

Global Data Centers

Data is automatically

spread across all servers to scale

the storage, processing, and querying of big

data

Datacenter

• Data can be dispersed "randomly" across all servers for maximum parallel query performance

• Data can be sharded onto specific servers for data colocation, predictable data processing, predictable lookups, etc.

• Common data can be replicated across all servers for quick local access

Consistent Real-time

Few Data Copies Less Compute Vertical Scale

Less Availability

Inconsistent Near-time

Many Data Copies More Compute

Horizontal Scale Global Availability

Is Horizontal Scale your Killer Argument for NoSQL?

Globally Consistent Clusters

18

One CPU Core

Multiple Cores

Multiple CPUs

Servers

Availability Zones

Global Data Centers

Sharded data is read and written locally.

Shared "master" data is read everywhere, but written only to one data center.

Pros • Global availability • Real-time reads & writes • Simple development:

Consistent data and simple multi-phase commit of shared master data Cons • All writes of shared data go to one cluster — this slows writes for distant locations • Hard to create a global, federated view of sharded data (but not hard for MarkLogic) • When a data center fails, any data committed in it but not yet transmitted to other data

centers is lost until the failed data center comes back online

Zone 1

Zone 2

Datacenter 2

Zone 1

Zone 2

Datacenter 1

async

sync

sync

Consistent Real-time

Few Data Copies Less Compute Vertical Scale

Less Availability

Inconsistent Near-time

Many Data Copies More Compute

Horizontal Scale Global Availability

Is Multimaster your Killer Argument for NoSQL?

Is Data Model your Killer Argument for NoSQL?

19

A Relational Model of Data for Large Shared Data Banks E. F. CODD IBM Research Laboratory, San Jose, California Information Retrieval, Volume 13 / Number 6 / June, 1970 Programs should remain unaffected when the internal representation of data is changed. …Tree-structured…inadequacies…are discussed. …Relations…are discussed and applied to the problems of redundancy and consistency…. KEY WORDS AND PHRASES: data base, data structure, data organization, hierarchies of data, networks of data, relations CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22

+ Data Narrative + Relationships = Contextual Information

1. Relational Model and Normal Form 1.1. INTRODUCTION This paper is concerned with the application of elementary relation theory…to…formatted data. …The problems…are those of data independence…and…data inconsistency…. The relational view…appears to be superior in several respects to the graph or network model…. …Relational view…forms a sound basis for treating derivability, redundancy, and consistency…. [and] a clearer evaluation…of

1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS …Tables…represent a major advance toward the goal of data independence…

1.2.1. Ordering Dependence. …Programs which take advantage of the stored ordering of a file are likely to fail…if…it becomes necessary to replace that ordering by a different one.

1.2.2. Indexing Dependence. …Can application programs…remain invariant as indices come and go? …

1.2.3. Access Path Dependence. Many of the existing formatted data systems provide users with tree-structured files or slightly more general network models of the data. …These programs fail when a change in structure becomes necessary. The…program…is required to exploit…paths to the data. …Programs become dependent on the continued existence of the…paths.

T

P O L L

E A A A

I

I T

T T

T

T

T T

T T

T

R R R R R

(Semantic & Structural) = Meaningful Knowledge

Is JSON or XML your Killer Argument for NoSQL?

<section> <heading> Data Models</heading>

<paragraph> This paper shows…. </paragraph>

<paragraph> The

<i>relational</i>

model is no longer, <br/>

the only game in town. </paragraph></section>

{"section": { "heading": "Data Models", "paragraphs":[ {"paragraph": [ { "s": "This paper shows…." } ]}, {"paragraph": [

"The ", {"i": "relational"}, "model is no longer,", {"br": null}, "the only game in town." ]}]}}

JSON 1. Best for structured data (text poured into objects) 2. Best for computer languages 3. No document type and immature schemas 4. Objects, arrays, floats, strings, booleans, nulls 5. No namespaces, No comments, No attributes 6. Easy, simple, compact, and fast to parse

XML 1. Best for structured text (structure added on top of text) 2. Best for natural languages 3. Document types with optional mature schemas 4. Objects, sets, all data types: dates, durations, integers, etc. 5. Namespaces, Comments, Attributes 6. Attributes add metadata; Namespaces embed object types

20

Wide-column/Key-value

More structure (schema) Less structure (schemaless)

Graph

#20 Neo4j #32 MarkLogic #41 OrientDB #44 Titan

Surgeonperformed

on

works at

at

at

friend of

Operation

Person

Hospital

likes

has poor success at fails at

Graph/RDF

Dimensional

#1 Oracle Exadata #13 Teradata #16 Hive #28 Netezza #29 Vertica #33 Greenplum #36 Amazon Redshift

Data Warehouse Hospital KeyHospital Attributes...

Hospital Dimension

Surgeon KeySurgeon Attributes...

Surgeon DimensionOperation KeyOperation Attributes...

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes...

Drug Dimension

#1 Oracle Exalytics #19 SAP HANA

Live Analytics Hospital KeyHospital Attributes...

Hospital Dimension

Surgeon KeySurgeon Attributes...

Surgeon DimensionOperation KeyOperation Attributes...

Operation Dimension

Hospital KeySurgeon KeyOperation KeyDrug KeyDrug Dose

Drug Dose Facts Drug KeyDrug Attributes...

Drug Dimension

Relational

#58 GemFire #69 Oracle x10

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

newSQL

#1 Oracle DB #2 MySQL #3 SQL Server #5 PostgreSQL #6 DB2 #10 SQLite #12 SAP AS #19 SAP HANA #21 Informix #22 MariaDB

Surgeon

Surgeon Number

Surgeon First NameSurgeon Last Name

Operation Codes

Operation Code

Operation Name

Operation

Hospital NumberOperation Number

Surgeon NumberOperation Code

Hospital

Hospital Number

Hospital Nameperformed at

schedulesperformed by

performs

classified byclassifies

SQL

Document

#11 ElasticSearch #14 Solr #35 MarkLogic #37 Sphinx

XML Hospital Name: John Hopkins Operation Number: 13 Operation Type: Heart Transplant Surgeon Name: Dorothy Oz Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg Maxicillan Canada4Less Drugs 400 mg Minicillan Drug USA 150 mg

Doc Warehouse

JSON

Document

#4 MongoDB #24 Couchbase #25 CouchDB #32 MarkLogic #41 OrientDB #48 Cloudant

Hadoop #18 Splunk

Raw

Big Data

Raw

#9 Redis #23 Memcached #26 DynamoDB #31 Riak

Key/Value Simple

Key

#8 Cassandra #15 Hbase

Wide-Column Complex

Key

Low

Lat

ency

Ope

ratio

nal

Velo

city

Hi

gh B

andw

idth

Ana

lytic

al V

olum

e

Hour

s m

inut

es

sec

onds

m

illise

cond

s

mic

rose

cond

s PB

s

TBs

GBs

0

.1Kt

0.5

Kt

1Kt

1

0Kt

100K

t Which databases best deliver your killer NoSQL arguments?

by Michael Bowers 2016-04-07

v. 1.2

How to Make the Case for MarkLogic Internally

22

2016 MarkLogic World

[email protected]


Recommended