+ All Categories
Home > Software > Spark meets Spring

Spark meets Spring

Date post: 27-Jul-2015
Category:
Upload: markfisher
View: 150 times
Download: 0 times
Share this document with a friend
15
1 © Copyright 2015 Pivotal. All rights reserved. Spark meets Spring Mark Fisher Pivotal
Transcript

1 © Copyright 2015 Pivotal. All rights reserved.

Spark meets Spring Mark Fisher Pivotal

2 © Copyright 2015 Pivotal. All rights reserved.

Jobs, Steps, Readers, Writers

Ingestion, Export, Orchestration, Hadoop

Controllers, REST, WebSocket

Channels, Adapters, Filters, Transformers

WEB INTEGRATION BATCH BIG DATA

SPRING CORE

FRAMEWORK SECURITY REACTOR

DATA

RELATIONAL DATA ACCESS

NON-RELATIONAL DATA ACCESS

BOOT

Bootable, Minimal, Ops-Ready

GRAILS Full-stack, Web

IO EXECUTION

IO FOUNDATION

IO COORDINATION SPRING CLOUD XD

Stream, Taps, Jobs

3 © Copyright 2015 Pivotal. All rights reserved.

Spring XD – 10,000 ft view

Spring XD Runtime

BIDIRECTIONAL

Compute HDFS

RDBMS

NoSQL

R, SAS

Streams Jobs

ingest workflow

export

taps

Predictive Modelling

>_

Redis

4 © Copyright 2015 Pivotal. All rights reserved.

Core Concepts

•  Modules –  Source polls external source or Event Driven –  Processor takes input and produces output –  Sink consumes input, outputs to external

system

•  Streams –  Source | {Processor}0…n | Sink

•  Taps –  Dynamically add taps to listen for events

•  Jobs –  Directed Graph of Steps –  ETL jobs based on Spring Batch –  Workflow orchestration on Hadoop or Spark

5 © Copyright 2015 Pivotal. All rights reserved.

Ingestion

� Stream data from a variety of sources

� Write data to a variety of sinks

� Dozens of sources/sinks out of the box –  Kafka, Files, Gemfire, HTTP, HDFS…

� How to do this in XD? –  Pipes and filters DSL

stream create tweets –definition “twitterstream | hdfs”

6 © Copyright 2015 Pivotal. All rights reserved.

Streams

HTTP  Tail  File  Mail  

Twi,er  Gemfire  Syslog  TCP  UDP  JMS  

RabbitMQ  MQTT  Trigger  

Reactor  TCP/UDP  

Filter  Transformer  

Object-­‐to-­‐JSON  JSON-­‐to-­‐Tuple  

Spli,er  Aggregator  HTTP  Client  

JPMML  Evaluator  Shell  Groovy  Python  Java  

File  HDFS  JDBC  TCP  Log  Mail  

RabbitMQ  Gemfire  Splunk  MQTT  

Dynamic  Router  Counters  

7 © Copyright 2015 Pivotal. All rights reserved.

Real Time Processing �  Counters

�  Model Scoring

�  Functional Stream Processing –  RxJava, Spark Streaming

�  Custom Java, Python Code

�  Spring Data Repositories –  Map data structures to objects –  Store in Cassandra, Gemfire, Neo4j, MongoDB, Elastic Search,

Couchbase, JPA..

8 © Copyright 2015 Pivotal. All rights reserved.

Real Time Processing �  Custom Java code

�  Tap to count events

�  Tap to count occurrence of language

stream create tweets –definition “twitterstream | myProcessor | hdfs”

stream create tweetcount --definition "tap:stream:tweets > aggregate-counter"

stream create tweetlang --definition "tap:stream:tweets > field-value-counter --fieldName=lang”

9 © Copyright 2015 Pivotal. All rights reserved.

Dashboard and REST APIs �  Spring XD REST APIs for Analytics –  Easy to Create Counters, Gauges –  Invoked by JavaScript Libraries - D3.js

�  Spring Data Repositories –  Map data structures to objects –  Store in Cassandra, Gemfire, Neo4j,

Mongo DB, Elastic Search, Couchbase, JPA…

�  Spring Data REST –  Easy to expose REST APIs for

Repositories

10 © Copyright 2015 Pivotal. All rights reserved.

Batch Processing

�  Job Orchestration –  Hadoop (M/R, Pig, Hive) –  Spark Batch

� ETL –  CSV to HDFS –  HDFS to JDBC

job create myjob --definition "hdfsjdbc --resources=/xd/data/*.csv --names=forename,surname --tableName=people"

11 © Copyright 2015 Pivotal. All rights reserved.

XD Admin Leader XD Admin

Leader XD Admin

Leader ZK

XD Container XD Container

module

module

module

�  XD Admin –  Assigns Modules to Containers –  Re-Assigns on failures for HA

�  Zoo Keeper –  Tracks Container State

�  XD Container –  Standalone, YARN, or Cloud Foundry –  Loads modules

▪  Isolates class loader and context

–  Connects to data bus ▪  In memory direct channel ▪  Kafka, Rabbit MQ, Redis

XD UI XD Shell

Kafka/RabbitMQ/Redis

module

module

module

module

module

Batch Job State DB Analytics Repository

Runtime

12 © Copyright 2015 Pivotal. All rights reserved.

Data Partitioning •  PartitionKey – what field in the data to partition on –  e.g. payload.customer.id

•  Partition ID - key.hashcode % count

13 © Copyright 2015 Pivotal. All rights reserved.

Spring XD – Spark Stream Processing XD Handles the Input/Output

Message Bus Receiver in Spark Cluster Message Bus Sender in Spark Cluster

Events processed at micro batch level

Java and Scala Interfaces Implement process method Process DStream input Create DStream output

14 © Copyright 2015 Pivotal. All rights reserved.

Resources � Code –  https://github.com/spring-projects/spring-xd –  https://github.com/spring-projects/spring-xd-samples

� Docs –  http://docs.spring.io/spring-xd/docs/current/reference/html/

September 14-17, 2015 Washington, DC

http://springone2gx.com/

BUILT FOR THE SPEED OF BUSINESS


Recommended