+ All Categories
Home > Software > Northeastern DB Class Introduction to Marklogic NoSQL april 2016

Northeastern DB Class Introduction to Marklogic NoSQL april 2016

Date post: 15-Apr-2017
Category:
Upload: matt-turner
View: 125 times
Download: 0 times
Share this document with a friend
52
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Matt Turner, CTO Media & Entertainment Introduction to MarkLogic NoSQL
Transcript

MarkLogic

Matt Turner, CTO Media & Entertainment

Introduction to MarkLogic NoSQL

COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

1

OutlineSomethings Happening HereThe Old and the NewData modelsData accessDiscussion

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

AnalysisOperationsAccess

DATA makes an impact

3

Stress on Traditional Data Approaches

ComplexityStructuredUnstructuredSemi-structuredRawStreams of dataConstant changeAgile analyticsFail-fastVolumeVelocityVarietyVolumeMany months of system log filesEvery tweetYears of articlesRelative to current size of operationVelocityStreams of customer feedback to determine sentimentReal-time risk analysisReal-time Business IntelligenceVarietyDatabase feedsRaw logsWeb crawl dataArticlesMulti-mediaALSO: questions!

ExamplesBig Data: Gartner coined the three Vs descriptionData: Petabyte scaleNodes: Thousands

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

4

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Stress on Traditional Data Approaches

ComplexityStructuredUnstructuredSemi-structuredRawStreams of dataConstant changeAgile analyticsFail-fastVolumeVelocityVarietyVolumeMany months of system log filesEvery tweetYears of articlesRelative to current size of operationVelocityStreams of customer feedback to determine sentimentReal-time risk analysisReal-time Business IntelligenceVarietyDatabase feedsRaw logsWeb crawl dataArticlesMulti-mediaALSO: questions!

ExamplesBig Data: Gartner coined the three Vs descriptionData: Petabyte scaleNodes: Thousands

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

6

Leader QuadrantOnline Transaction Processing RDBS

(May 2002)

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Leader QuadrantOperational DBMS

(Oct 2014)

Traditional MainstaysUpstarts Storm the Field

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MarkLogic: Best OperationalData Warehouse

(Aug 2014)

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

A Unified, Actionable 360 View of DataWHAT BUSINESSES WANT

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

10

AnalysisOperationsAccess

DATA makes an impact

11

12

Data Is In Silos

Data is spread across disconnected databasesM&A outpaces the speed of data integrationData needs to be delivered in real timeTHE REALITY

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

13

80% of time By data scientists just wrangling dataWASTEDIn 2015 on creating relational data silos

Of data warehouse projects is on ETLThe Massive Cost of Integrating Data From Silos36Billion inSPENDING$% OF THECOST60

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Sources:80% of time spent by data scientists on just wrangling dataData scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets. Steve Lohr. For Big-Data Scientists, Janitor Work Is Key Hurdle to Insights. The New York Times. August 17, 2014. 60% of the cost of data warehouse projects is on ETLIn a report sponsored by Informatica, analysts at TDWI estimate between 60% and 80% of the total cost of a data warehouse project may be taken up by ETL software and processes. $36 Billion in spending on database management systems in 2015Gartner. Forecast: Enterprise Software Markets, Worldwide, 2011-2018, 4Q14. 2014.

14

Relational Databases with ETL Sacrifice Agility, Timeliness, and CostAll future data needs must be predictableNew SQL queries require database re-indexingSiloed database changes require ETL re-writesTHE IT CHALLENGE

ETLOLTPARCHIVESETLETLETLDATA MARTSETL

WAREHOUSE

REFERENCE DATA

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

15

ETL

OLTPWarehouseData Marts

ETL

ETLETL

ETLArchives

UnstructuredVideoAudioSignals,Logs,StreamsSocialDocuments,Messages{}MetadataSearch

ETL

ETL

ReferenceData

ETLIts Complicated

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

16

The OLD:Lets Design the Application

(And pretend its the 80s)

17

NameHair ColourFulltime Employee?Car typePaulBlondY

AlexAuburnYPorsche

DomBlackYHummernamehr_colr flltme_emplcar_tpLets Begin Cast Members{How many characters wide should this be? 8? 16? 32?{{{

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

18

New Schema Extend Ours!namehr_colrflltme_emplcar_tpPaulBlondYAlexAuburnYporscheDomBlackYHummer

house_roadtowncitypostcode11d Yonge PkFinsburyLondonN4 3NUReadingLondonN43

Hang onIf this table had 10k rows, issues?First create new big schemaThen import rows acrossDelete old table?Maybe not, legacy programs might use it!

What if we want to select Road only?Split out againMore extensions?House name and number?

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

19

There is another way!Create a new table and point to it from the old one!

namehr_colrflltme_emplcar_tpAddressPaulBlondYAlexAuburnYporscheDomBlackYHummer

house_roadtowncitypostcode11d Yonge PkFinsburyLondonN4 3NUReadingLondonN43

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

20

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Now Lets Store Something More . . . ComplicatedTranscript / BookInfoTitle = NL April 14Author = SNL CastSectionChapterPageParagraph = I love penguins becausePageParagraph = On the subject of foodChapterPageSectionChapterChapterChapterParagraphParagraphtitleauthorSectionI love PenguinsS. Lion

Issues with Sections? How many columns?

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

22

Dont Forget TaxonomiesHierarchical levels of metadata

Fixed to a specific business purposeCant be re-used in new contexts

Each record can only be associated with one levelHow many category fields?

CategoryFeature

SeriesActionDramaComedyDocumentaryCable

BroadcastDramaComedyActionDramaFamilyDocumentary

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

ResultRequires everything to be defined up front Data to be transformed and processed to fit the systemNeeds to be redone as information changesCostly to create, maintain and only captures part of the data!

TitleProductionDateCategoryAssetTypeLengthFilm13/1/14FeatureHD Master2:40Show16/4/13SeriesHD7200:40Film26/4/05FeatureArchive1:55

CategoryFeature

SeriesActionDramaComedyDocumentaryCable

BroadcastDramaComedyActionDramaFamilyDocumentary?

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Traditional Technology

ETL

OLTPWarehouseData Marts

ETL

ETLETL

ETLArchives

UnstructuredVideoAudioSignals,Logs,StreamsSocialDocuments,Messages{}MetadataSearch

ETL

ETL

ReferenceData

ETL

*NOTE: We only did this little bit!Remember?

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

26

The NEW! Enter NoSQLCategoryDescriptionExamplesKey-valuePersistent hash-table on steroidsTypically no single modeling paradigm (e.g. columns can be primitives, data structures, binaries, etc.)Amazon DynamoDBRedisRiakColumnarSimilar to K-V in some waysColumn may be arranged in groups (families)Data types are usually the expected primitivesWorks well with value crunching (e.g. time series)HBaseCassandraDocumentURI-mapped (i.e. keyed) documents in lieu of rowsSupports structured and unstructured contentNested contextMarkLogicMongoDBCouchbaseGraphDeals with inter-object graphsRelationship orientedThink object cache (with pointers) on steroidsNeo4JAllegroGraphInfoGrid

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

A Database That Integrates Data Better, Faster, with Less CostTHE DESIRED SOLUTION

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

28

The MarkLogic AlternativeAn Operational and Transactional Enterprise NoSQL Database

Data ingested as is (no ETL)Structured and unstructured data Data and metadata togetherAdapts to changing data and changing data structures

EASY TO GET DATA INFlexible Data Model

Index once and query endlesslyReal-time and lightning fastQuery across JSON, XML, text, geospatial, and semantic triples in one database

EASY TO GET DATA OUTAsk Anything Universal Index

Reliable data and transactions (100% ACID compliant)Out-of-the-box automatic failover, replication, and backup/recovery Enterprise-grade security and Common Criteria certified

100% TRUSTEDEnterprise Ready

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

29

The SNL App

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

No need to define up frontMatched to complex content and metadata data modelingData is managed in its most accessible, natural formXML, JSON, RDF, geospatial

Flexible Data Model

Schema-agnostic, structure-aware

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Instead of THIS

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Do it like THIS!

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Search and QuerySearch to find answers in documents, relationships, and metadata

Automatic indexing of every data value, text and data structureSpecialized indexes for data values (analytics, facets, sorting), geospatial and triplesAll updated in the context of ACID transactions to ensure data integrity and real-time accessAccessible via fully programmable search API with full-text search, type-ahead suggestions, facets, snippeting, highlighted search terms, proximity boosting, relevance ranking, and language support

JavaScriptXQuerySPARQLRich Query CapabilityIn-databaseMapReduceFull-text SearchSemantic SearchGeospatial Search

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

34

Timing

Context

Whos Smarter?

VS

Do domestic dogs interpret pointing as a command? Animal Cognition (2012): 1-12 , November 09, 2012By Scheider, Linda; Kaminski, Juliane; Call, Josep; Tomasello, Michael

Context!

Machines Dont Get Context . . . Manu Sporny Founder/CEO - Digital Bazaar, Inc.http://www.cambridgesemantics.com/semantic-university/what-is-linked-data

COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: #

Enter Semantics!

Manu Sporny Founder/CEO - Digital Bazaar, Inc.http://www.cambridgesemantics.com/semantic-university/what-is-linked-data

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

SemanticsEnterprise triple store, document store, and database combined

Store and query billions of facts and relationshipsLeverage ontologies for domain and role specific context access to data and documentsEfficient metadata management with relationships to ontologiesStandards-based for ease of use and integrationRDF, SPARQL, and standard REST interfaces

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

41

Semantics to Model RelationshipsData model to manage relationships and link together datatriples describe single factsCollections of facts describe complex real-world scenarios

ChevyNBC"isOnSNL"isOn

isOn!

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Ontologies Instead of CategoriesActually model information as it is in the real world

Not limited to a single purposeOntologies for all categories of metadataEven impossible categories like fictional worlds

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

NoSQL and Semantics!

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Real-Time AnalyticsRange indexes can be used forFaceted searchAggregation and visualizationAnalyticsincluding custom user-defined functionsCo-occurrenceSQL, ODBC, and BI integration

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Scalability, Elasticity and CloudMassive enterprise scalability and elasticity

Scale horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documentsProcess thousands of multi-document multi-statement transactions per secondStart small and scale up or down to meet capacity and performance demands without over-provisioning or over-spendingFully cloud enabled for automated deployment and management on EC2Leverage dynamic configurations with Tiered Storage

D-NODE

D-NODE

E-NODE

E-NODE

D-NODE

Add nodesto scale outAutomated failoverResult: Enterprise-ready to power mission critical products

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

46

Use Case: Deliver Better InformationPresent information based on relationships

Go beyond traditional technology with depth of content

Drive efficiency using semantic approach to tagging

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Use Case: Go Beyond SearchConcept instead of keyword search

Related content and information drive the content discovery and new interactionsSNL40 continuous viewing

Dynamically tailored to the users specific attributes or activity

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Use Case: Integrate DataIntegrate data across the automoti

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Bob PilzTaxonomy ManagerMitchell1

Semantics-driven search

TalentKristen WiigActed in

Episode 4Anne Hathaway and Killers

Part of

Played

CharacterMaharelle Sister

Season 34

SegmentThe Lawrence Welk ShowAired on

Date10/4/08

EraActed inIncludesPart of

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Intelligent recommendation engine

SLIDE: # COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

52


Recommended