+ All Categories
Home > Software > Spark Your Legacy (Spark Summit 2016)

Spark Your Legacy (Spark Summit 2016)

Date post: 11-Apr-2017
Category:
Upload: tzach-zohar
View: 160 times
Download: 6 times
Share this document with a friend
44
Spark Your Legacy: How to distribute your 8-year old monolith Moran Tavori, Tzach Zohar // Kenshoo // June 2016
Transcript
Page 1: Spark Your Legacy (Spark Summit 2016)

Spark Your Legacy: How to distribute your 8-year old monolith

Moran Tavori, Tzach Zohar // Kenshoo // June 2016

Page 2: Spark Your Legacy (Spark Summit 2016)

Who’s this talk for?

Page 3: Spark Your Legacy (Spark Summit 2016)

Who are we?

Tzach Zohar, System Architect @ Kenshoo

Moran Tavori, Lead backend developer @ Kenshoo

Working with Spark for ~2.5 years

Started with Spark version 1.0.x

Page 4: Spark Your Legacy (Spark Summit 2016)

Who’s Kenshoo

10-year Tel Aviv-based startup

Industry Leader in Digital Marketing

500+ employees

Heavy data shop

Page 5: Spark Your Legacy (Spark Summit 2016)

The Problem

Page 6: Spark Your Legacy (Spark Summit 2016)

Legacy “batch job” in Monolith

Job performs aggregations applying complex business rules

Monolith is a Java application running hundreds of types of “jobs” (threads)

Tight coupling between jobs (same codebase, shared state)

Sharded by client

Doesn’t scale

Page 7: Spark Your Legacy (Spark Summit 2016)

Solution: Spark!

Spark elegantly solves the business case for that job, as proven by POC

“API well suited for our use case”

“Very little boilerplate / plumbing code”

“Testable”

- from POC conclusions

Page 8: Spark Your Legacy (Spark Summit 2016)

The “Greenfield” Dilemma

A: Legacy System

B: New Shiny System

Refactoring?

“Greenfield” project?

Page 10: Spark Your Legacy (Spark Summit 2016)

Mitigating “Greenfield” risks

Page 11: Spark Your Legacy (Spark Summit 2016)

Problem #1: Code is our only Spec

Page 12: Spark Your Legacy (Spark Summit 2016)

Code is our only Spec

What exactly should the new system do?

Page 13: Spark Your Legacy (Spark Summit 2016)

Code is our only Spec

What exactly should the new system do?

Page 14: Spark Your Legacy (Spark Summit 2016)

Don’t Assume.

- Kenshoo developers, circa 2014

photo credit: realfoodkosher.com

Measure.

Page 15: Spark Your Legacy (Spark Summit 2016)

Solution #1: Empirical Reverse Engineering

Page 16: Spark Your Legacy (Spark Summit 2016)

Solution #1: Empirical Reverse Engineering

Page 17: Spark Your Legacy (Spark Summit 2016)

Solution #1: Empirical Reverse Engineering

Page 18: Spark Your Legacy (Spark Summit 2016)

Problem #2: Moving Target

Page 19: Spark Your Legacy (Spark Summit 2016)

Moving Target

Q1 Q3Q2

Legacy

Page 20: Spark Your Legacy (Spark Summit 2016)

Moving Target

Q1 Q3Q2

New System

Legacy

Page 21: Spark Your Legacy (Spark Summit 2016)

Moving Target

Q1 Q3Q2

Legacy’

New System

Legacy

Page 22: Spark Your Legacy (Spark Summit 2016)

Solution #2: Share Code

1. Refactor legacy code to isolate business rules in separate jar

Legacy Monolith

Page 23: Spark Your Legacy (Spark Summit 2016)

Solution #2: Share Code

1. Refactor legacy code to isolate business rules in separate jar

Legacy Monolith

Legacy Monolith

business rules

Page 24: Spark Your Legacy (Spark Summit 2016)

Solution #2: Share Code

1. Refactor legacy code to isolate business rules in separate jar2. Build new system around this shared jar

Legacy Monolith

Legacy Monolith

business rules

New System

business rules

Page 25: Spark Your Legacy (Spark Summit 2016)

Solution #2: Share CodeList<Score> filtered = new LinkedList<>();

ScoreProviderData providerData = scoreProviderDao.getByScore(scores);

for (Score s : scores) {

if (validProviderForScore(s, providerData)) {

ScoreSource providerSource = providerData.getSource();

if (providerSource == s.getSource()) {

filtered.add(s);

}

}

}

Page 26: Spark Your Legacy (Spark Summit 2016)

Solution #2: Share Codepublic boolean shouldAggregateScore(ShouldAggregateKey key) { … }

List<Score> filtered = new LinkedList<>();

for (Score s : Scores) {

if (shouldAggregateScore(key(s)) {

filtered.add(s);

}

}

Page 27: Spark Your Legacy (Spark Summit 2016)

Solution #2: Share Codepublic boolean shouldAggregateScore(ShouldAggregateKey key) { … }

val scores: RDD[S] = // ...

val filtered: RDD[S] = scores.filter(s => shouldAggregateScore(key(s)))

Page 28: Spark Your Legacy (Spark Summit 2016)

Problem #3: Zero Diff Tolerance

Page 29: Spark Your Legacy (Spark Summit 2016)

Zero Diff Tolerance

Some downstream modules might be sensitive to any new behavior

Page 30: Spark Your Legacy (Spark Summit 2016)

Solution #3: Run Side-by-Side with Legacy

At the system level:

Page 31: Spark Your Legacy (Spark Summit 2016)

Solution #3: Run Side-by-Side with Legacy

… and at the component level:

Page 32: Spark Your Legacy (Spark Summit 2016)

Solution #3: Run Side-by-Side with Legacy

At the component level:

Page 33: Spark Your Legacy (Spark Summit 2016)

Problem #4: Test Reuse

Page 34: Spark Your Legacy (Spark Summit 2016)

Test Reuse

Legacy System Tests

Batch Job

Before

Page 35: Spark Your Legacy (Spark Summit 2016)

Test Reuse

Legacy System Tests

After

Page 36: Spark Your Legacy (Spark Summit 2016)

Test Reuse

Legacy System Tests

New Aggregation System

After

Page 37: Spark Your Legacy (Spark Summit 2016)

Test Reuse

Legacy System Tests

New Aggregation System

Spark Cluster

After

Page 38: Spark Your Legacy (Spark Summit 2016)

Solution #4: Local Mode

Legacy System Tests

New Aggregation System

Spark Cluster

Page 39: Spark Your Legacy (Spark Summit 2016)

Solution #4: Local Mode

Use Spark’s Local Mode to embed it in new system Legacy System Tests

New Aggregation System

Spark Local

“Cluster”

Page 40: Spark Your Legacy (Spark Summit 2016)

Solution #4: Local Mode

Use Spark’s Local Mode to embed it in new system Legacy System Tests

New Aggregation System

Spark Local

“Cluster”

Use new system’s “Local Mode” to embed it in legacy system

Page 41: Spark Your Legacy (Spark Summit 2016)

Solution #4: Local Mode

Use Spark’s Local Mode to embed it in new system Legacy System Tests

New Aggregation System

Spark Local

“Cluster”

Use new system’s “Local Mode” to embed it in legacy system

Ta Da! No test setup

Page 42: Spark Your Legacy (Spark Summit 2016)

In Conclusion

Spark’s fluent APIs made it possible to share code with the old system

Spark’s local mode made testing easier

Common agile practices gave us control over the results before our system was client facing

Page 44: Spark Your Legacy (Spark Summit 2016)

Thank You

Questions?


Recommended