Date post: | 09-Feb-2017 |
Category: |
Technology |
Upload: | liveperson |
View: | 408 times |
Download: | 2 times |
System revolution How we did itVictor Perepelitsky
questions: www.meetup.com/ILTechTalks/events/226834931/slideshare: www.slideshare.net/victorperepelitskyemail: [email protected]
LivePerson customer example
salesman visitor from UK
chat lines
get session state activity reventschat lines
sales manager
invite chat UK visitors
see reports
invite
3
LivePerson at a glance
4
● Account (brand) - LivePerson customer● Visitor - individuals who interacts with the
business owner’s brand ● Agent - an account representative who may
interact with visitors (examples: technical support, sales)
● Admin - an account representative who defined the business goals and normally manages agents in order to effectively reach them
LivePerson at a glance
agent visitor Chat scale (2K req/sec)
Visitor scale (100K req/sec)
chat lines
get session state activity reventschat lines
admin
define business rules
see reports
Admin scale (under 100 req/sec)
invite
5
Legacy
agent visitor Chat scale (2K req/sec)
Visitor scale (100K req/sec)
chat linesget session state activity revents
chat lines
admin
define business rulessee reports
Admin scale (under 100 req/sec)
Real Time Server
Offline and Reporting
6
Legacy - stateful + account stickysession from account B
RT server E, F, G
RT server A, C
RT serverB, D
web server web server
session from account A
7
Legacy
● Works
● Fast
● Partially resilient
● Huge amount of features
8
Legacy - pains
● Hard to scale
● Hard to add new features
● Poor resource utilization
● Poor manageability
● Poor QoS
● Huge friction with customers 9
Let's go back
agent visitor Chat scale (2K req/sec)
Visitor scale (100K req/sec)
chat lines
get session state activity reventschat lines
admin
define business rules
see reports
Admin scale (under 100 req/sec)
invite
10
Proper system architecture
agent visitor Chat scale (2K req/sec)
Visitor scale (100K req/sec)
chat linesget session state activity revents
chat lines
admin
define business rulessee reports
Admin scale (under 100 req/sec)
real time
offline
reporting config
11
The new dream
agent visitor
Chat scale (2K req/sec)
Visitor scale (100K req/sec)
chat linessession state
activity reventschat lines
admin
business rulessee reports
Admin scale (under 1K req/sec)
chat
offline
reporting config monitor and engage
* Business App / Extension 12
Monitor and engage = sharkShark manifesto● Collects and makes available data about
individuals (visitors) as they interact with the business owner’s brand (account)
● Acts in real-time to engage visitors (chat, ad, call etc..)
● Is a platform for a business logic modules (sharklets) which might be independently developed and deployed
13
Fundamental decisions
Requirements?
14
Platform requirements
● E2E latency within DC < 30 mills● Good resources utilization (CPU > 50%)● Efficient - At least 500 req/sec per node● Sharklet development lifecycle is independent● High Availability
○ uptime > 99.99999% ○ data loss < 0.01%
● Resilient - no service downtime when external resource is unavailable (minimal degradation is allowed)
● Business logic correctness - 99.9%
15
Fundamental decisions
Requirements? -> definedStateful or stateless?
16
Stateful
stickiness is required
session 1 session 2 session 3 session 4
17
Statelesssession 1 session 2 session 3 session 4
session data
Each request potentially requires access to session data store
18
Facts that helped us to decide
1. Legacy works as “Stateful without HA”2. A small data loss has a tiny customer
impact (0.01% loss is good enough)3. Stateless requires much more
resources and initial effort4. We can add HA store in the future
19
Stateful shark
ACCOUNT Nsession B
RT server E, F, G
RT server A, C
RT server D
web server web server
session A
NN , B
20
Fundamental decisions
Requirements? -> definedStateful or stateless? What are the big parts?
21
What are the big parts?
22
Legacy - successful patterns
1. Requests are processed in memory2. External resources are accessed
asynchronously to visitor requests3. Customer Rules and Data
(AccountConfig) are kept in memory and may be updated on background
23
Legacy - pains
1. Order of calls (inside code + rules)2. Business logic are not pluggable
components3. Http requests are tightly coupled
within logical levels (hard to move toward other protocols as WebSockets)
24
25
SYNC - Fast CEP, engagements
ASYNC - slow actions, external resources access
sharklet A (sync handlers)
sharklet A (async handlers)
web visitor agent mobile
visitor
facadeadapter adapter adapter
Account Runtime Data
Message BUS
external resource
26
Shark - The Big Parts
1. Facade - decouples real world protocols from the logical layers
2. CEP - avoids call order management3. Sync - very fast in memory processing4. Async - allows slow actions and ext
resources access5. Account Runtime Store - allows in
memory access to customer configuration
27
Fundamental decisions
Requirements? -> definedStateful or stateless? -> statefulWhat are the big parts? -> we have itBasic technology stack
28
Basic technology stack - ?
29
We were practical
CEP technology?
30
CEP - in a nutshell
31
Drools - in a nutshell
32
Drools - we tried to kill it
We had
● played with it - :)● integrated into shark - :)● made a POC using LivePerson logic - :)
● tested for performance - :(
33
We played with more technologies
34
And finally chose the solution
35
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event Queue
ba
36
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event Queue
a
b
37
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event Queue
ba
a
38
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event Queue
b
c
39
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event Queue
b
c
40
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event Queue
b
c
41
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event Queue
42
Sharklet handler example
43
Fundamental decisions
Stateful or stateless? -> statefulWhat are the big parts? -> we have itBasic technology stack -> choosedCEP - Technology choice -> DIY (inhouse)
44
Fundamental decisions
Stateful or stateless? -> statefulWhat are the big parts? -> we have itBasic technology stack -> choosedCEP - Technology choice -> DIY (inhouse)Locking architecture
45
Locking - The model
The world
account A
session 1session 1
session 1
session 4
46
Locking - Legacy pains
● You must be aware of locking when writing a business logic
● Write lock on account freezes all account operations
● Locking became the bottleneck (Not a CPU)
● BUGs 47
Locking - Shark solution
● Read/Write lock for session
● Write business logic only - no locking
awareness
● No write lock on account - copy on write
48
SYNC - A single proc cycle uses consistent account data copy
ASYNC - updates account data using copy on write pattern
sharklet A (sync handlers)
sharklet A (async handlers)
web visitor agent mobile
visitor
facadeadapter adapter adapter
Account Runtime Data
external resource
49
Sharklet example (no locks)
50
Fundamental decisions
Stateful or stateless? -> statefulWhat are the big parts? -> we have itBasic technology stack -> choosedCEP - Technology choice -> DIY (inhouse)Locking architecture -> decided
51
We had a good start
52
But! We were alone
53
LiveEngage - the big decision
54
Dream = LiveEngage platform
agent visitor
Chat scale (2K req/sec)
Visitor scale (100K req/sec)
chat linessession state
activity reventschat lines
admin
business rulessee reports
Admin scale (under 1K req/sec)
chat
offline
reporting config monitor and engage
* Business App / Extension 55
Rules - from definition to runtime
visitor
activity revents
admin
business rules
config monitor and engage
* Business App / Extension
if the visitor meets the conditions -> invite to chat
56
Rules in LiveEngage dream
57
What is rules engine
Rules engine serves as pluggable software component which executes business rules
These rules are externalized or separated from application code
58
Rules engine implementationBoolean logic is the easy part
59
Rules engine implementation
Hard to detect which conditions must be evaluated
new Fact
60
Rules engine implementation
Hard to implement drools like DSL
61
Rules Engine - How to make it happen?
62
● Drools - Eats memory● Legacy rules engine
○ Customer friction is too high○ Not efficient
63
64
GRF - Generic Rules FrameworkConditions and outcomes are building blocks that can be used for complex rules creation
hard coded building blocks
TimeOnPage
GeoLocation
InviteToChat
rule
if ( timeOnPage(5) and geoLocation(“US”))execute{ inviteToChat()}
65
GRF + CEP = RulesEngine
GeoLocation condition
trigger when (geo data is changed)
evaluate(geo, accountConfig){ if (geo == accountConfig.geo) TRUE else FALSE}
Condition type implementor defines the evaluation trigger instead of automatic detection
66
Shark Rules Engine (Condition)
67
GRF - giraffe
GIRAFFE68
SYNC - Detects which conditions should be evaluated and trigger GRF
ASYNC - loades rules to shark rules engine
sharklet A (sync handlers)
sharklet A (async handlers)
web visitor agent mobile
visitor
facadeadapter adapter adapter
Account Runtime Data
Message BUS
Account Config
Rules Engine
69
We did a little more
AND
Felt ready to go
70
SYNC - CEP, Rules, Report-Sharklet
ASYNC - integrated with account config
sharklet B sharklet B
web visitor agent mobile
visitor
facadeadapter adapter adapter
Rules Engine
Account Config
Account Runtime Data
Message BUS
sharklet A sharklet A
Account Config Service
71
Feel the field
Legacy
agent visitor admin
activities
- Silent mode
72
The dream comes true
agent visitor
chat linessession state
activity reventschat lines
admin
business rulessee reports
chat
offline
reporting config monitor and engage
* Business App / Extension 73
Platform in action
Legacy chat
agent visitor admin
activities
engagements
Account ConfigReports
First small customers
74
Shark
We started with small cluster
And just added servers with business growth
75
We recognized major bottlenecks
76
And easily fixed it
77
Tools and techniques● Statistics monitoring
● Testing methodology
● Java 8
● Notes about G1
78
Statistics monitoring - graphite
79
Statistics monitoring - graphite
80
Statistics monitoring - metrics
https://github.com/dropwizard/metricshttp://metrics.dropwizard.ioprivate final Timer responses =
metrics.timer(name(RequestHandler.class, "responses"));
public String handleRequest(Request request, Response response) { final Timer.Context context = responses.time(); try { // etc; return "OK"; } finally { context.stop(); }}
81
Testing methodology● Unit test - use it ● Integration test - invest here● System test - try to minimize effort● Performance
○ Integration - worth it○ System - choose your tests
82
Performance test logs
83
Performance test validations
84
Testing methodology
How did we test platform?
We had ● built main code with tests in mind● mocked our clients
85
Java 8 ● We moved to java 8 one year ago
● It was easy :)
● Pushed us to ○ more expressive code○ functional style○ immutability
search on youtube - LivePerson Functional Java 886
Notes about G1● Designed for big heaps and
minimizes big pauses● Is considered to be the default GC
in java 9
● We have tested our system with G1 when 12 GB was used and○ received good results (no big GC
paused)
87
88
We are happy now ● Horizontal scalability● Independent and safe business
logic development● Fast development cycles (platform,
sharklets, data-model)● Efficient resource utilization● Less BUGs (Easier to fix)● Better QoS● Overall confidence
89
Numbers
____________________________________
Pick statistics Shark Legacy
Concurrent visitors ~ 100K ~ 1 Million
Request/Sec ~ 11K ~ 110K
Machines ~ 34 ~700
Cores ~ 224 ~ 6300
Cost per visitor ~ 0.001 ~ 0.006
90
Future challenges and ideas
● Better High availability
● Deployment with no downtime
● Management tools
● 100K accounts
91
Tips● Define scope and requirements ● Company commitment is a must● Work with your clients● Treat test code as if it runs in
production● Automated perf tests - it helps● Sometimes DIY is the best solution● Respect legacy - combine old ideas
with new technologies ● Understand the complexity and find
the simplest solution 92
Never stop dreaming93
THANK YOU!
We are hiring
94