Home >Software >Spark Your Legacy (Spark Summit 2016)

Spark Your Legacy (Spark Summit 2016)

Date post:11-Apr-2017
Category:
View:151 times
Download:5 times
Share this document with a friend
Transcript:
  • Spark Your Legacy: How to distribute your 8-year old monolith

    Moran Tavori, Tzach Zohar // Kenshoo // June 2016

  • Whos this talk for?

  • Who are we?

    Tzach Zohar, System Architect @ Kenshoo

    Moran Tavori, Lead backend developer @ Kenshoo

    Working with Spark for ~2.5 years

    Started with Spark version 1.0.x

    https://www.linkedin.com/in/tzachzoharhttps://www.linkedin.com/in/tzachzoharhttps://www.linkedin.com/in/moran-tavori-9518135ahttps://www.linkedin.com/in/moran-tavori-9518135a

  • Whos Kenshoo

    10-year Tel Aviv-based startup

    Industry Leader in Digital Marketing

    500+ employees

    Heavy data shop

    http://kenshoo.com/

  • The Problem

  • Legacy batch job in Monolith

    Job performs aggregations applying complex business rules

    Monolith is a Java application running hundreds of types of jobs (threads)

    Tight coupling between jobs (same codebase, shared state)

    Sharded by client

    Doesnt scale

  • Solution: Spark!

    Spark elegantly solves the business case for that job, as proven by POC

    API well suited for our use case

    Very little boilerplate / plumbing code

    Testable

    - from POC conclusions

  • The Greenfield Dilemma

    A: Legacy System

    B: New Shiny System

    Refactoring?

    Greenfield project?

  • How do we make the jump?

    photo credit: Szed Gerg

    http://www.vitalmtb.com/photos/member/Selection-2012,4388/Szed-Gerg-Road-Gap,46058/NorbertSzasz,12703http://www.vitalmtb.com/photos/member/Selection-2012,4388/Szed-Gerg-Road-Gap,46058/NorbertSzasz,12703

  • Mitigating Greenfield risks

  • Problem #1: Code is our only Spec

  • Code is our only Spec

    What exactly should the new system do?

  • Code is our only Spec

    What exactly should the new system do?

  • Dont Assume.

    - Kenshoo developers, circa 2014

    photo credit: realfoodkosher.com

    Measure.

    http://www.realfoodkosher.com/top-7-measuring-tools-every-kitchen-needs/http://www.realfoodkosher.com/top-7-measuring-tools-every-kitchen-needs/

  • Solution #1: Empirical Reverse Engineering

  • Solution #1: Empirical Reverse Engineering

  • Solution #1: Empirical Reverse Engineering

  • Problem #2: Moving Target

  • Moving Target

    Q1 Q3Q2

    Legacy

  • Moving Target

    Q1 Q3Q2

    New System

    Legacy

  • Moving Target

    Q1 Q3Q2

    Legacy

    New System

    Legacy

  • Solution #2: Share Code

    1. Refactor legacy code to isolate business rules in separate jar

    Legacy Monolith

  • Solution #2: Share Code

    1. Refactor legacy code to isolate business rules in separate jar

    Legacy Monolith

    Legacy Monolith

    business rules

  • Solution #2: Share Code

    1. Refactor legacy code to isolate business rules in separate jar2. Build new system around this shared jar

    Legacy Monolith

    Legacy Monolith

    business rules

    New System

    business rules

  • Solution #2: Share CodeList filtered = new LinkedList();

    ScoreProviderData providerData = scoreProviderDao.getByScore(scores);

    for (Score s : scores) {

    if (validProviderForScore(s, providerData)) {

    ScoreSource providerSource = providerData.getSource();

    if (providerSource == s.getSource()) {

    filtered.add(s);

    }

    }

    }

  • Solution #2: Share Codepublic boolean shouldAggregateScore(ShouldAggregateKey key) { }

    List filtered = new LinkedList();

    for (Score s : Scores) {

    if (shouldAggregateScore(key(s)) {

    filtered.add(s);

    }

    }

  • Solution #2: Share Codepublic boolean shouldAggregateScore(ShouldAggregateKey key) { }

    val scores: RDD[S] = // ...

    val filtered: RDD[S] = scores.filter(s => shouldAggregateScore(key(s)))

  • Problem #3: Zero Diff Tolerance

  • Zero Diff Tolerance

    Some downstream modules might be sensitive to any new behavior

  • Solution #3: Run Side-by-Side with Legacy

    At the system level:

  • Solution #3: Run Side-by-Side with Legacy

    and at the component level:

  • Solution #3: Run Side-by-Side with Legacy

    At the component level:

  • Problem #4: Test Reuse

  • Test Reuse

    Legacy System Tests

    Batch Job

    Before

  • Test Reuse

    Legacy System Tests

    After

  • Test Reuse

    Legacy System Tests

    New Aggregation System

    After

  • Test Reuse

    Legacy System Tests

    New Aggregation System

    Spark Cluster

    After

  • Solution #4: Local Mode

    Legacy System Tests

    New Aggregation System

    Spark Cluster

  • Solution #4: Local Mode

    Use Sparks Local Mode to embed it in new system Legacy System Tests

    New Aggregation System

    Spark Local

    Cluster

  • Solution #4: Local Mode

    Use Sparks Local Mode to embed it in new system Legacy System Tests

    New Aggregation System

    Spark Local

    Cluster

    Use new systems Local Mode to embed it in legacy system

  • Solution #4: Local Mode

    Use Sparks Local Mode to embed it in new system Legacy System Tests

    New Aggregation System

    Spark Local

    Cluster

    Use new systems Local Mode to embed it in legacy system

    Ta Da! No test setup

  • In Conclusion

    Sparks fluent APIs made it possible to share code with the old system

    Sparks local mode made testing easier

    Common agile practices gave us control over the results before our system was client facing

  • Make your Greenfield special.

    photo credit: Getty Images

    http://inhabitat.com/world%E2%80%99s-highest-tennis-court-was-a-green-roof-atop-the-burj-al-arab-in-dubai/http://inhabitat.com/world%E2%80%99s-highest-tennis-court-was-a-green-roof-atop-the-burj-al-arab-in-dubai/

  • Thank You

    Questions?

Click here to load reader

Embed Size (px)
Recommended