+ All Categories
Home > Documents > PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational +...

PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational +...

Date post: 20-May-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
35
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Hemant Puranik, Technical Product Manager, Data Engineering, MarkLogic PUTTING SPARK TO WORK WITH MARKLOGIC
Transcript
Page 1: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Hemant Puranik, Technical Product Manager, Data Engineering, MarkLogic

PUTTING SPARK TO WORK WITH MARKLOGIC

Page 2: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 2

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Agenda How Apache Spark Complements MarkLogic

Spark and MarkLogic Use Cases

Deep Dive – Using Spark with MarkLogic

MarkLogic and Spark Integration – What’s Next

Q&A

Page 3: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

HOW APACHE SPARK COMPLEMENTS MARKLOGIC

Page 4: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 4

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

“For Large-Scale Data Processing”… Open Source Cluster Computing Framework

Faster than Hadoop MapReduce

Built to deliver sophisticated analytics

Easy to use for manipulating large datasets

API abstractions over SQL and NoSQL data – Analytics over Hadoop, RDBMS, MarkLogic etc.

Unified Engine for Advanced Analytics – Streaming, Machine Learning, Graph etc.

WHAT IS APACHE SPARK?

API LIBRARIES Scala, Python, Java and R

APACHE SPARK CORE

SPARK SQL + DATAFRAMES

SPARK STREAMING

MLlib Machine Learning

GraphX Graph

Computation

Page 5: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 5

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MarkLogic vs Spark

OPERATIONAL ANALYTICAL

Response in Milliseconds 10,000 to 100,000 Concurrent Users Highly Selective Queries Read and Write Access Security and ACID Compliance

Response in Seconds and Minutes 10s or 100s Concurrent Users Non Selective Queries Read Only Access Parallel Computations on Immutable Data

Page 6: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 6

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MarkLogic and Spark?

1. Aggregating data that comes in different shapes and source-specific formats

2. Highly concurrent transactions and secure query execution over changing data

3. Operational BI and Reporting in real time or near real time

4. Loading data from external sources into MarkLogic ‒ transforming data on the fly

5. Treat MarkLogic datasets as immutable in order to perform multi-step analytics

6. Looping insights derived from analytical processes into operational applications

+

+

+

Page 7: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

MARKLOGIC & SPARK USE CASES

Page 8: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 8

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Batch Data Movement – Spark Data Pipeline

OPERATIONAL APPS

DOCUMENTS

RELATIONAL

SPARK DATA PIPELINE

SPARK DATA PIPELINE

MARKLOGIC DATA WAREHOUSE

OTHER DATA SOURCES

Page 9: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 9

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Advanced Analytics – Machine Learning, Graph Analytics

MARKLOGIC

SPARK MLlib and GraphX

DOCUMENTS

RELATIONAL

ADVANCED ANALYTICS

Page 10: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 10

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MARKLOGIC

Streaming Analytics

SPARK STREAMING

KAFKA

FLUME SEARCH & ALERTING

Page 11: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

DEEP DIVE

USING SPARK WITH MARKLOGIC

Page 12: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 12

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Using the Hadoop Connector Spark has built-in support for loading/saving

Hadoop data

MarkLogic Hadoop Connector represents MarkLogic documents in Hadoop Compatible input/output formats

MarkLogic Hadoop Connector is certified against Hortonworks and Cloudera platforms that ship with Spark

SPARK AND MARKLOGIC SPARK API Java, Scala, Python, R

SPARK CORE

MARKLOGIC HADOOP CONNECTOR

HADOOP YARN

Page 13: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 13

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Cluster Computing Framework Each driver applications has its own executor

processes in the cluster

Resource Management via Cluster Manager

Standalone

Yarn (Using MarkLogic Hadoop Connector)

Mesos

OVERVIEW OF SPARK

Page 14: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 14

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Key Concept – Resilient Distributed Dataset (RDD)

Collections of objects physically partitioned across cluster

Stored in RAM or on Disk or Mixed RDD Operations - Transformations and

Actions

RDD

Transformation

Action Result

Produce New RDD Map / FlatMap Filter SortByKey GroupByKey …

Compute or Save Results Collect Reduce Count Save …

Page 15: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

//first you create the spark context within java SparkConf conf = new SparkConf().setAppName("com.marklogic.spark.examples").setMaster("local"); JavaSparkContext context = new JavaSparkContext(conf); //Create configuration object and load the MarkLogic specific properties from the configuration file Configuration hdConf = new Configuration(); FileInputStream ipStream = new FileInputStream(configFilePath); hdConf.addResource(ipStream); //Create RDD based on documents within MarkLogic database. //Load documents as DocumentURI, MarkLogicNode pairs. JavaPairRDD<DocumentURI, MarkLogicNode> mlRDD = context.newAPIHadoopRDD(

hdConf, //Configuration DocumentInputFormat.class, //InputFormat DocumentURI.class, //Key Class MarkLogicNode.class //Value Class

);

Loading MarkLogic Data Into Spark RDD

For more details refer to: How to use MarkLogic in Apache Spark applications.

Page 16: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 16

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Spark Data Processing Pipeline Built-In Optimizer for Data Processing

Lazy Transformations

Pipeline Execution

Transformations and Data Partitioning

Narrow Transformations (No data shuffling)

Map, FlatMap, Filter, …

Wide Transformations (Data Shuffling)

SortByKey, ReduceByKey, GroupByKey, ….

Tracks data lineage for re-computing RDD in case of failure

Input Data

RDD1

RDD2

RDD3 RDD4

Output Data

Output Data

Transformation – T1

Transformation – T2

T4 T3

Action - A1 Action – A2

Job - 1 Job - 2

Page 17: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 17

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MarkLogic WordCount Transformation

<?xml version="1.0" encoding="UTF-8"?> <complaint> <Complaint_ID>1172370</Complaint_ID> <Product>Credit reporting</Product> <Issue>Improper use of my credit report</Issue> <Sub-issue>Report improperly shared by CRC</Sub-issue> <State>CA</State> <ZIP_code>94303</ZIP_code> <Submitted_via>Web</Submitted_via> <Date_received>12/28/2014</Date_received> <Date_sent_to_company>12/28/2014</Date_sent_to_company> <Company>TransUnion</Company> <Company_response>Closed with explanation</Company_response> <Timely_response_>Yes</Timely_response_> <Consumer_disputed_>Yes</Consumer_disputed_> </complaint>

...

... (Product,11) (Product:Bank account or service,44671) (Product:Consumer loan,12683) (Product:Credit card,48400) (Product:Credit reporting,54768) (Product:Debt collection,62662) (Product:Money transfers,2119) (Product:Mortgage,143231) (Product:Other financial service,191) (Product:Payday loan,2423) (Product:Prepaid card,626) (Product:Student loan,11489) (State,63) (State:,5360) (State:AA,10) (State:AE,141) (State:AK,465) ... ...

Page 18: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 18

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MarkLogic WordCount – Spark Data Pipeline

Load MarkLogic documents into Spark RDD

Map documents to Name/Value pairs

Group Name/Value pairs

Count of Distinct Values for each Name

Map Name:Value to Occurrence Count

Count Occurrences for each Name:Value

Filter statistically insignificant Name:Value

Combine Name => Count and Name:Value = > Count

Page 19: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

//Convert XML elements into name value pairs where element content is value elementNameValues = mlRDD.flatMapToPair(ELEMENT_NAME_VALUE_PAIR_EXTRACTOR); //Group element values for the same element name elementNameValueGroup = elementNameValues.groupByKey(); //Count distinct values for each element name elementNameDistinctValueCountMap = elementNameValueGroup.mapValues(DISTINCT_VALUE_COUNTER); //Map the element name value pairs to occurrence count of each name:value pair elementValueOccurrenceCountMap = elementNameValues.mapToPair(ELEMENT_VALUE_OCCURRENCE_COUNT_MAPPER); //Aggregate the occurrence count of each distinct name:value pair. elementValueOccurrenceAggregateCountMap = elementValueOccurrenceCountMap.reduceByKey(VALUE_COUNT_REDUCER); //Filter out the name:value occurrences that are statistically insignificant relevantNameValueOccurrences = elementValueOccurrenceAggregateCountMap.filter(ELEMENT_VALUE_COUNT_FILTER); //Combine the distinct value count for each element and occurrence count for each name:value pair valueDistribution = elementNameDistinctValueCountMap.union(relevantNameValueOccurrenceCounters);

MarkLogic WordCount – Spark Data Pipeline

For more details refer to: How to use MarkLogic in Apache Spark applications.

Page 20: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 20

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Spark SQL & MarkLogic EXAMPLE

SPARK SQL

SPARK CORE

MARKLOGIC HADOOP CONNECTOR

HADOOP YARN

Page 21: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 21

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Spark SQL and DataFrames DataFrames

Abstraction for Structured Data – RDD + Schema

Container for Logical Plan

Multiple DSLs share same query engine/optimizer

MarkLogic DataFrame is created based on RDD

MARKLOGIC & SPARK Spark SQL Dataframe DSL

Data Frame API

Data Sources

and more…

Spark Core

Existing RDD

and more…

Page 22: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

1. Load MarkLogic Documents into Spark RDD

2. Create and configure SQLContext within your spark application SQLContext sqlContext = new SQLContext(context); sqlContext.setConf("spark.sql.shuffle.partitions", String.valueOf(10));

3. Create Spark DataFrame - Map document into tuple & apply schema JavaRDD<ConsumerComplaint> complaints = mlRDD.map(CONSUMER_COMPLAINT_EXTRACTOR); DataFrame sqlRDD = sqlContext.applySchema(ccComplaints, ConsumerComplaint.class);

4. Register a temporary table and execute SQL sqlRDD.registerTempTable(“ConsumerComplaints"); DataFrame resultsRDD = sqlContext.sql("SELECT company, state, COUNT(complaintID) as NumComplaints " + "FROM ConsumerComplaints" + "GROUP BY company, state " + "ORDER BY company, state");

Page 23: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 23

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Spark Machine Learning & MarkLogic

EXAMPLE

SPARK SQL

SPARK CORE

MARKLOGIC HADOOP CONNECTOR

HADOOP YARN

SPARK MLLib

Page 24: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 24

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Spark Machine Learning – Pipeline Concepts Transformer – Abstraction for feature transformers and learned models

Estimator – Abstraction of a learning algorithm; trains on data

TESTING/PRODUCTION

Load Data

Extract Features

Train Model

Learned Model

DataFrame

Transformer

Estimator

TRAINING

Load Data

Extract Features

Apply Learned Model

Act On Predictions

DataFrame

Transformer

Transformer

Page 25: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 25

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Credit Risk Assessment using Machine Learning <?xml version="1.0" encoding="UTF-8"?> <CreditApplication> <AccountBalance> 2000 or Above </AccountBalance> <AccountDurationMonths> 12 </AccountDurationMonths> <CreditHistory>All credits paid back duly</CreditHistory> <CreditPurpose>Computer and Electronics</CreditPurpose> <CreditAmount> 1269 </CreditAmount> <LengthOfCurrentEmployment> 1-4 Years</LengthOfCurrentEmployment> <InstallmentPercentage> 13 </InstallmentPercentage> <Gender>female</Gender> <MaritalStatus> Married </MaritalStatus> <CurrentResidenceDuration> 3 years </CurrentResidenceDuration> <ValuableAssets> Real Estate </ValuableAssets> <Age> 42 </Age> <Housing> Rent </Housing> <NumberOfCreditsWithBank> 2 </NumberOfCreditsWithBank> <Occupation>Skilled Professional</Occupation> <NumberOfDependents> 1 </NumberOfDependents> --- </CreditApplication>

<?xml version="1.0" encoding="UTF-8"?> <CreditApplication>

<CreditRisk> --- </CreditRisk> <AccountBalance> --- <AccountDurationMonths> --- <CreditHistory> --- <CreditPurpose> --- <CreditAmount> --- <LengthOfCurrentEmployment> --- <Gender> --- <MaritalStatus> --- <CurrentResidenceDuration> --- <ValuableAssets> --- <Age> --- <Housing> --- <NumberOfCreditsWithBank> --- <Occupation> --- <NumberOfDependents> --- --- </CreditApplication>

Page 26: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

1. Load credit rating training data from MarkLogic into Spark DataFrame

2. Extract and transform credit rating features using VectorAssembler transformer val assembler = new VectorAssembler().setInputCols(featureColumns).setOutputCol("features")

val featureVectors = assembler.transform(trainingData)

3. Transform the credit risk labels into ordered indices using StringIndexer val labelIndexer = new StringIndexer().setInputCol(classColumn).setOutputCol("label")

val preparedTrainingSet = labelIndexer.fit(featureVectors).transform(featureVectors)

4. Train model using RandomForestClassifier estimator val classifier = new RandomForestClassifier()

.setImpurity("gini")

.setMaxDepth(3)

.setNumTrees(20)

.setFeatureSubsetStrategy("auto")

.setSeed(5043)

model = classifier.fit(preparedTrainingSet)

Page 27: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

1. Load new credit applications from MarkLogic into Spark DataFrame

2. Extract and transform credit rating features using VectorAssembler transformer val assembler = new VectorAssembler().setInputCols(featureColumns).setOutputCol("features") val creditFeatureVectors = assembler.transform(creditApplicationData)

3. Apply the previously learned model to predict credit risk for new application val predictions = model.transform(creditFeatureVectors)

4. Update the status of credit application in MarkLogic based on the prediction

Page 28: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

WHAT’S NEXT

MARKLOGIC AND SPARK INTEGRATION

Page 29: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 29

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Native Spark Connector for MarkLogic No runtime dependency on Hadoop / Yarn

Simplified API for working with Spark RDD

WHAT’S NEXT

CONNECTOR FOR MARKLOGIC

SPARK SQL

SPARK CORE

SPARK API Java, Scala, Python, R

Page 30: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

//Create RDD based on documents within MarkLogic database. sparkContext.newMarkLogicRDD(host, port, user, pwd, database, filterQuery); //Save an arbitrary RDD to MarkLogic database sparkContext.saveRDDToMarkLogicDatabase(host, port, user, pwd, database, …);

Page 31: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 31

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MarkLogic Database as a SparkSQL Data Source Support Spark SQL connectivity via Data

Source API

Simplified API for working with Spark DataFrame

WHAT’S NEXT

CONNECTOR FOR MARKLOGIC

SPARK SQL

SPARK CORE

SPARK API Java, Scala, Python, R

Page 32: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

//Create DataFrame based on predefined views within MarkLogic database. Dataframe df = sqlContext.read.MarkLogicView(host, port, …, …, viewName, filter); //Save an arbitrary DataFrame to MarkLogic database df.write.MarkLogic(documentURIMapper, [autoCreateView=false]);

Page 33: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 33

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Key Takeaways Spark is open source big data processing engine (faster than Hadoop/MapReduce)

MarkLogic’s strength is in operational uses cases (i.e. highly concurrent transactional workload)

MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases

Write your Spark Application leveraging the MarkLogic Hadoop Connector

Load MarkLogic data in a RDD and/or a DataFrame and use it in Spark apps

What’s Next – Native Spark connector for MarkLogic

Page 34: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

SLIDE: 34

© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

More Information Blog on Developer Community

How to use MarkLogic in Apache Spark applications

GitHub Repository with example application https://github.com/HemantPuranik/MarkLogicSparkExamples

Page 35: PUTTING SPARK TO WORK WITH MARKLOGIC · MarkLogic and Spark are complementary in ‘Operational + Analytical’ use cases Write your Spark Application leveraging the MarkLogic Hadoop

Q&A


Recommended