Date post: | 31-Jul-2015 |
Category: |
Software |
Upload: | tzach-zohar |
View: | 161 times |
Download: | 0 times |
1© 2015 Kenshoo, Ltd. Proprietary Information
Spark Your Legacy: Distributing an 8-year Monolith
Tzach Zohar, Kenshoo, May 2015
2© 2015 Kenshoo, Ltd. Proprietary Information
Who?
Tzach Zohar
Architect @ Kenshoo
[email protected]://il.linkedin.com/in/tzachzohar
3© 2015 Kenshoo, Ltd. Proprietary Information
Where?
● Online advertising technology● 9-year old startup● ~500 employees● Data-intensive (aren’t we all?)
4© 2015 Kenshoo, Ltd. Proprietary Information
Agenda● Project Background● Why not to Greenfield● Refactoring Challenges● Solutions
6© 2015 Kenshoo, Ltd. Proprietary Information
Domain: Data Aggregation● Of: advertising metrics● On: versatile, batched, occasionally re-stated input● By: many different keys● When: now + ~0.5 hour● While: filtering and normalising per business rules● For: eternity (data lives forever)
7© 2015 Kenshoo, Ltd. Proprietary Information
Domain: Data Aggregation
Slow
Sources
Fast
Custom
Re-stated
Norm
alize
Aggregate
By X
By Y
By X + Y
...
Observations
8© 2015 Kenshoo, Ltd. Proprietary Information
Domain: Data Aggregation
Slow
Sources
Fast
Custom
Re-stated
Norm
alize
Aggregate
By X
By Y
By X + Y
...
Observations
Aggregate
9© 2015 Kenshoo, Ltd. Proprietary Information
Requirement: Better, Faster ● Higher throughput: business is growing● More keys: and ad-hoc aggregations● Linear scalability: anything else is not cost-effective● Easy to enhance: by any decent developer
10© 2015 Kenshoo, Ltd. Proprietary Information
Chosen Design: Spark
sources
Norm
alize
Driver
HDFS + Spark Cluster
11© 2015 Kenshoo, Ltd. Proprietary Information
Chosen Design: Spark
sources
Norm
alize
Driver
HDFS + Spark Cluster
Landing Zone
12© 2015 Kenshoo, Ltd. Proprietary Information
Chosen Design: Spark
sources
Norm
alize
Driver
HDFS + Spark Cluster
Landing Zone
By X
By Y
By X+Y
...Spark Jobs
13© 2015 Kenshoo, Ltd. Proprietary Information
B: New Shiny System
Great, but how do we get there?
A: Legacy System
Refactoring?
“Greenfield” project?
???
17© 2015 Kenshoo, Ltd. Proprietary Information
Q1 Q3Q2
Legacy Legacy’
New System
Challenge: Moving Target
18© 2015 Kenshoo, Ltd. Proprietary Information
Challenge: Zero Diff Tolerance● Different clients have different data, different
customizations, different scales● Our data is often validated against external
sources
19© 2015 Kenshoo, Ltd. Proprietary Information
Challenge: Code Is Our Only Spec
?But it isn’t necessarily a friendly one...
20© 2015 Kenshoo, Ltd. Proprietary Information
Challenge: Code Is Our Only SpecWhat exactly should the new system do?
21© 2015 Kenshoo, Ltd. Proprietary Information
Challenge: Test Reuse?Tests assume a single-server setup...
22© 2015 Kenshoo, Ltd. Proprietary Information
Challenge: Test Reuse?Some are coupled with current implementation...
25© 2015 Kenshoo, Ltd. Proprietary Information
Challenge: Tight Coupling Implementation is tightly coupled with many other components
Kenshoo Server
Search Engines
SE API Facade
Web U
ser Interface
Proxy Servers Client's Website
Client Users
Client Systems / DWH
Entity
Mgm
t / D
AO
Normalizers
Optimization Algorithms
Data P
roviders / S
core SQ
L B
uilder
Client Configuration
SEM Entity Data
Performance Data
Campaign Generation Tools (RTC, KW Tool)
Report Generation
Bulk Editing and Advanced Features
Conf.
DAO
Kenshoo Editor
FTP Sites
Tracking Processor
Aggregator
HELP ME!
26© 2015 Kenshoo, Ltd. Proprietary Information
Challenge: Paradigm Shift How do you gradually refactor a single-node java application into a distributed Spark application?
29© 2015 Kenshoo, Ltd. Proprietary Information
Legacy System New System
Solution #1: Shared Code
Core Business
Rules
1. Refactor legacy code to create stand-alone jar
30© 2015 Kenshoo, Ltd. Proprietary Information
Legacy System New System
Solution #1: Shared Code
Core Business
Rules
2. Build new system around this core code 1. Refactor legacy code to create stand-alone jar
Core Business
Rules
31© 2015 Kenshoo, Ltd. Proprietary Information
Solution #1: Shared CodeBusiness rules refactored into Java static methods, to avoid serialization issue in Spark
35© 2015 Kenshoo, Ltd. Proprietary Information
Solution #3: Local Mode Testing
Legacy System
New Aggregation System
Spark
36© 2015 Kenshoo, Ltd. Proprietary Information
Solution #3: Local Mode Testing
Legacy System
New Aggregation System
Spark
1. Embed Spark in Aggregation System
37© 2015 Kenshoo, Ltd. Proprietary Information
Solution #3: Local Mode Testing
Legacy System
New Aggregation System
Spark
1. Embed Spark in Aggregation System2. Embed Aggregation System in Legacy
38© 2015 Kenshoo, Ltd. Proprietary Information
Solution #4: Side-by-SideBoth at the component level and at the system level