+ All Categories
Home > Technology > System Revolution- How We Did It

System Revolution- How We Did It

Date post: 09-Feb-2017
Category:
Upload: liveperson
View: 408 times
Download: 2 times
Share this document with a friend
94
Transcript
Page 1: System Revolution- How We Did It
Guy Harel
I assume you removed Visit Manager slides. without them this slide does not make sense
Page 2: System Revolution- How We Did It

System revolution How we did itVictor Perepelitsky

questions: www.meetup.com/ILTechTalks/events/226834931/slideshare: www.slideshare.net/victorperepelitskyemail: [email protected]

Page 3: System Revolution- How We Did It

LivePerson customer example

salesman visitor from UK

chat lines

get session state activity reventschat lines

sales manager

invite chat UK visitors

see reports

invite

3

Page 4: System Revolution- How We Did It

LivePerson at a glance

4

● Account (brand) - LivePerson customer● Visitor - individuals who interacts with the

business owner’s brand ● Agent - an account representative who may

interact with visitors (examples: technical support, sales)

● Admin - an account representative who defined the business goals and normally manages agents in order to effectively reach them

Page 5: System Revolution- How We Did It

LivePerson at a glance

agent visitor Chat scale (2K req/sec)

Visitor scale (100K req/sec)

chat lines

get session state activity reventschat lines

admin

define business rules

see reports

Admin scale (under 100 req/sec)

invite

5

Page 6: System Revolution- How We Did It

Legacy

agent visitor Chat scale (2K req/sec)

Visitor scale (100K req/sec)

chat linesget session state activity revents

chat lines

admin

define business rulessee reports

Admin scale (under 100 req/sec)

Real Time Server

Offline and Reporting

6

Page 7: System Revolution- How We Did It

Legacy - stateful + account stickysession from account B

RT server E, F, G

RT server A, C

RT serverB, D

web server web server

session from account A

7

Page 8: System Revolution- How We Did It

Legacy

● Works

● Fast

● Partially resilient

● Huge amount of features

8

Page 9: System Revolution- How We Did It

Legacy - pains

● Hard to scale

● Hard to add new features

● Poor resource utilization

● Poor manageability

● Poor QoS

● Huge friction with customers 9

Page 10: System Revolution- How We Did It

Let's go back

agent visitor Chat scale (2K req/sec)

Visitor scale (100K req/sec)

chat lines

get session state activity reventschat lines

admin

define business rules

see reports

Admin scale (under 100 req/sec)

invite

10

Page 11: System Revolution- How We Did It

Proper system architecture

agent visitor Chat scale (2K req/sec)

Visitor scale (100K req/sec)

chat linesget session state activity revents

chat lines

admin

define business rulessee reports

Admin scale (under 100 req/sec)

real time

offline

reporting config

11

Page 12: System Revolution- How We Did It

The new dream

agent visitor

Chat scale (2K req/sec)

Visitor scale (100K req/sec)

chat linessession state

activity reventschat lines

admin

business rulessee reports

Admin scale (under 1K req/sec)

chat

offline

reporting config monitor and engage

* Business App / Extension 12

Page 13: System Revolution- How We Did It

Monitor and engage = sharkShark manifesto● Collects and makes available data about

individuals (visitors) as they interact with the business owner’s brand (account)

● Acts in real-time to engage visitors (chat, ad, call etc..)

● Is a platform for a business logic modules (sharklets) which might be independently developed and deployed

13

Page 14: System Revolution- How We Did It

Fundamental decisions

Requirements?

14

Page 15: System Revolution- How We Did It

Platform requirements

● E2E latency within DC < 30 mills● Good resources utilization (CPU > 50%)● Efficient - At least 500 req/sec per node● Sharklet development lifecycle is independent● High Availability

○ uptime > 99.99999% ○ data loss < 0.01%

● Resilient - no service downtime when external resource is unavailable (minimal degradation is allowed)

● Business logic correctness - 99.9%

15

Page 16: System Revolution- How We Did It

Fundamental decisions

Requirements? -> definedStateful or stateless?

16

Page 17: System Revolution- How We Did It

Stateful

stickiness is required

session 1 session 2 session 3 session 4

17

Page 18: System Revolution- How We Did It

Statelesssession 1 session 2 session 3 session 4

session data

Each request potentially requires access to session data store

18

Page 19: System Revolution- How We Did It

Facts that helped us to decide

1. Legacy works as “Stateful without HA”2. A small data loss has a tiny customer

impact (0.01% loss is good enough)3. Stateless requires much more

resources and initial effort4. We can add HA store in the future

19

Page 20: System Revolution- How We Did It

Stateful shark

ACCOUNT Nsession B

RT server E, F, G

RT server A, C

RT server D

web server web server

session A

NN , B

20

Page 21: System Revolution- How We Did It

Fundamental decisions

Requirements? -> definedStateful or stateless? What are the big parts?

21

Page 22: System Revolution- How We Did It

What are the big parts?

22

Page 23: System Revolution- How We Did It

Legacy - successful patterns

1. Requests are processed in memory2. External resources are accessed

asynchronously to visitor requests3. Customer Rules and Data

(AccountConfig) are kept in memory and may be updated on background

23

Page 24: System Revolution- How We Did It

Legacy - pains

1. Order of calls (inside code + rules)2. Business logic are not pluggable

components3. Http requests are tightly coupled

within logical levels (hard to move toward other protocols as WebSockets)

24

Guy Harel
you have another slide with the same subject
Page 25: System Revolution- How We Did It

25

Guy Harel
characters slide in some rectangles
Page 26: System Revolution- How We Did It

SYNC - Fast CEP, engagements

ASYNC - slow actions, external resources access

sharklet A (sync handlers)

sharklet A (async handlers)

web visitor agent mobile

visitor

facadeadapter adapter adapter

Account Runtime Data

Message BUS

external resource

26

Page 27: System Revolution- How We Did It

Shark - The Big Parts

1. Facade - decouples real world protocols from the logical layers

2. CEP - avoids call order management3. Sync - very fast in memory processing4. Async - allows slow actions and ext

resources access5. Account Runtime Store - allows in

memory access to customer configuration

27

Guy Harel
Sync --> Sync handling/ers. Async -- > Async handling/ers
Page 28: System Revolution- How We Did It

Fundamental decisions

Requirements? -> definedStateful or stateless? -> statefulWhat are the big parts? -> we have itBasic technology stack

28

Page 29: System Revolution- How We Did It

Basic technology stack - ?

29

Page 30: System Revolution- How We Did It

We were practical

CEP technology?

30

Page 31: System Revolution- How We Did It

CEP - in a nutshell

31

Page 32: System Revolution- How We Did It

Drools - in a nutshell

32

Page 33: System Revolution- How We Did It

Drools - we tried to kill it

We had

● played with it - :)● integrated into shark - :)● made a POC using LivePerson logic - :)

● tested for performance - :(

33

Page 34: System Revolution- How We Did It

We played with more technologies

34

Page 35: System Revolution- How We Did It

And finally chose the solution

35

Page 36: System Revolution- How We Did It

Shark CEP - processing cycle

handler 1

handler 2

handler 3

Event Queue

ba

36

Page 37: System Revolution- How We Did It

Shark CEP - processing cycle

handler 1

handler 2

handler 3

Event Queue

a

b

37

Page 38: System Revolution- How We Did It

Shark CEP - processing cycle

handler 1

handler 2

handler 3

Event Queue

ba

a

38

Page 39: System Revolution- How We Did It

Shark CEP - processing cycle

handler 1

handler 2

handler 3

Event Queue

b

c

39

Page 40: System Revolution- How We Did It

Shark CEP - processing cycle

handler 1

handler 2

handler 3

Event Queue

b

c

40

Page 41: System Revolution- How We Did It

Shark CEP - processing cycle

handler 1

handler 2

handler 3

Event Queue

b

c

41

Page 42: System Revolution- How We Did It

Shark CEP - processing cycle

handler 1

handler 2

handler 3

Event Queue

42

Page 43: System Revolution- How We Did It

Sharklet handler example

43

Page 44: System Revolution- How We Did It

Fundamental decisions

Stateful or stateless? -> statefulWhat are the big parts? -> we have itBasic technology stack -> choosedCEP - Technology choice -> DIY (inhouse)

44

Page 45: System Revolution- How We Did It

Fundamental decisions

Stateful or stateless? -> statefulWhat are the big parts? -> we have itBasic technology stack -> choosedCEP - Technology choice -> DIY (inhouse)Locking architecture

45

Page 46: System Revolution- How We Did It

Locking - The model

The world

account A

session 1session 1

session 1

session 4

46

Page 47: System Revolution- How We Did It

Locking - Legacy pains

● You must be aware of locking when writing a business logic

● Write lock on account freezes all account operations

● Locking became the bottleneck (Not a CPU)

● BUGs 47

Page 48: System Revolution- How We Did It

Locking - Shark solution

● Read/Write lock for session

● Write business logic only - no locking

awareness

● No write lock on account - copy on write

48

Page 49: System Revolution- How We Did It

SYNC - A single proc cycle uses consistent account data copy

ASYNC - updates account data using copy on write pattern

sharklet A (sync handlers)

sharklet A (async handlers)

web visitor agent mobile

visitor

facadeadapter adapter adapter

Account Runtime Data

external resource

49

Page 50: System Revolution- How We Did It

Sharklet example (no locks)

50

Page 51: System Revolution- How We Did It

Fundamental decisions

Stateful or stateless? -> statefulWhat are the big parts? -> we have itBasic technology stack -> choosedCEP - Technology choice -> DIY (inhouse)Locking architecture -> decided

51

Page 52: System Revolution- How We Did It

We had a good start

52

Page 53: System Revolution- How We Did It

But! We were alone

53

Page 54: System Revolution- How We Did It

LiveEngage - the big decision

54

Page 55: System Revolution- How We Did It

Dream = LiveEngage platform

agent visitor

Chat scale (2K req/sec)

Visitor scale (100K req/sec)

chat linessession state

activity reventschat lines

admin

business rulessee reports

Admin scale (under 1K req/sec)

chat

offline

reporting config monitor and engage

* Business App / Extension 55

Page 56: System Revolution- How We Did It

Rules - from definition to runtime

visitor

activity revents

admin

business rules

config monitor and engage

* Business App / Extension

if the visitor meets the conditions -> invite to chat

56

Page 57: System Revolution- How We Did It

Rules in LiveEngage dream

57

Page 58: System Revolution- How We Did It

What is rules engine

Rules engine serves as pluggable software component which executes business rules

These rules are externalized or separated from application code

58

Page 59: System Revolution- How We Did It

Rules engine implementationBoolean logic is the easy part

59

Page 60: System Revolution- How We Did It

Rules engine implementation

Hard to detect which conditions must be evaluated

new Fact

60

Page 61: System Revolution- How We Did It

Rules engine implementation

Hard to implement drools like DSL

61

Page 62: System Revolution- How We Did It

Rules Engine - How to make it happen?

62

● Drools - Eats memory● Legacy rules engine

○ Customer friction is too high○ Not efficient

Page 63: System Revolution- How We Did It

63

Page 64: System Revolution- How We Did It

64

Page 65: System Revolution- How We Did It

GRF - Generic Rules FrameworkConditions and outcomes are building blocks that can be used for complex rules creation

hard coded building blocks

TimeOnPage

GeoLocation

InviteToChat

rule

if ( timeOnPage(5) and geoLocation(“US”))execute{ inviteToChat()}

65

Page 66: System Revolution- How We Did It

GRF + CEP = RulesEngine

GeoLocation condition

trigger when (geo data is changed)

evaluate(geo, accountConfig){ if (geo == accountConfig.geo) TRUE else FALSE}

Condition type implementor defines the evaluation trigger instead of automatic detection

66

Page 67: System Revolution- How We Did It

Shark Rules Engine (Condition)

67

Page 68: System Revolution- How We Did It

GRF - giraffe

GIRAFFE68

Page 69: System Revolution- How We Did It

SYNC - Detects which conditions should be evaluated and trigger GRF

ASYNC - loades rules to shark rules engine

sharklet A (sync handlers)

sharklet A (async handlers)

web visitor agent mobile

visitor

facadeadapter adapter adapter

Account Runtime Data

Message BUS

Account Config

Rules Engine

69

Page 70: System Revolution- How We Did It

We did a little more

AND

Felt ready to go

70

Page 71: System Revolution- How We Did It

SYNC - CEP, Rules, Report-Sharklet

ASYNC - integrated with account config

sharklet B sharklet B

web visitor agent mobile

visitor

facadeadapter adapter adapter

Rules Engine

Account Config

Account Runtime Data

Message BUS

sharklet A sharklet A

Account Config Service

71

Page 72: System Revolution- How We Did It

Feel the field

Legacy

agent visitor admin

activities

- Silent mode

72

Page 73: System Revolution- How We Did It

The dream comes true

agent visitor

chat linessession state

activity reventschat lines

admin

business rulessee reports

chat

offline

reporting config monitor and engage

* Business App / Extension 73

Page 74: System Revolution- How We Did It

Platform in action

Legacy chat

agent visitor admin

activities

engagements

Account ConfigReports

First small customers

74

Page 75: System Revolution- How We Did It

Shark

We started with small cluster

And just added servers with business growth

75

Page 76: System Revolution- How We Did It

We recognized major bottlenecks

76

Page 77: System Revolution- How We Did It

And easily fixed it

77

Guy Harel
fixed them
Page 78: System Revolution- How We Did It

Tools and techniques● Statistics monitoring

● Testing methodology

● Java 8

● Notes about G1

78

Page 79: System Revolution- How We Did It

Statistics monitoring - graphite

79

Page 80: System Revolution- How We Did It

Statistics monitoring - graphite

80

Page 81: System Revolution- How We Did It

Statistics monitoring - metrics

https://github.com/dropwizard/metricshttp://metrics.dropwizard.ioprivate final Timer responses =

metrics.timer(name(RequestHandler.class, "responses"));

public String handleRequest(Request request, Response response) { final Timer.Context context = responses.time(); try { // etc; return "OK"; } finally { context.stop(); }}

81

Page 82: System Revolution- How We Did It

Testing methodology● Unit test - use it ● Integration test - invest here● System test - try to minimize effort● Performance

○ Integration - worth it○ System - choose your tests

82

Page 83: System Revolution- How We Did It

Performance test logs

83

Page 84: System Revolution- How We Did It

Performance test validations

84

Page 85: System Revolution- How We Did It

Testing methodology

How did we test platform?

We had ● built main code with tests in mind● mocked our clients

85

Page 86: System Revolution- How We Did It

Java 8 ● We moved to java 8 one year ago

● It was easy :)

● Pushed us to ○ more expressive code○ functional style○ immutability

search on youtube - LivePerson Functional Java 886

Page 87: System Revolution- How We Did It

Notes about G1● Designed for big heaps and

minimizes big pauses● Is considered to be the default GC

in java 9

● We have tested our system with G1 when 12 GB was used and○ received good results (no big GC

paused)

87

Page 88: System Revolution- How We Did It

88

Page 89: System Revolution- How We Did It

We are happy now ● Horizontal scalability● Independent and safe business

logic development● Fast development cycles (platform,

sharklets, data-model)● Efficient resource utilization● Less BUGs (Easier to fix)● Better QoS● Overall confidence

89

Page 90: System Revolution- How We Did It

Numbers

____________________________________

Pick statistics Shark Legacy

Concurrent visitors ~ 100K ~ 1 Million

Request/Sec ~ 11K ~ 110K

Machines ~ 34 ~700

Cores ~ 224 ~ 6300

Cost per visitor ~ 0.001 ~ 0.006

90

Page 91: System Revolution- How We Did It

Future challenges and ideas

● Better High availability

● Deployment with no downtime

● Management tools

● 100K accounts

91

Page 92: System Revolution- How We Did It

Tips● Define scope and requirements ● Company commitment is a must● Work with your clients● Treat test code as if it runs in

production● Automated perf tests - it helps● Sometimes DIY is the best solution● Respect legacy - combine old ideas

with new technologies ● Understand the complexity and find

the simplest solution 92

Page 93: System Revolution- How We Did It

Never stop dreaming93

Page 94: System Revolution- How We Did It

THANK YOU!

We are hiring

94


Recommended