Monolithic Batch Goes Microservice StreamingA story about one transformation
Charles Tye & Anton Polyakov
Who are We?
3 •
Anton Polyakov
Head of Application Development
2 years in Nordea
Charles Tye
Head of Core Services & Risk IT
17 years in Nordea
Develop solutions forMarket RiskCredit Risk
Liquidity RiskStress Testing
Messaging
Together with around 70 other people from all over the world
What We Do
Market Risk
4 •
The high level view
Quantify potential losses and exposures
Do many small risks add up to a big risk?
Can risks combine in unusual and unexpected ways?
Market Risk
5 •
Line of Defence
Protect Nordea and our customers
Daily internal reporting and external reporting to regulators
Independent function
Analysis and insight into the sources of risk
Control of risk
Management of capital
Examples of Risk Analysis
6 •
Value at Risk
Look at last 2 years of market history
Average of the worst 1% of outcomes
Simulate if the same thing happened again today.
Highly non-linear but requirement to drill in and find the drivers
Examples of Risk Analysis
7 •
Stress Scenarios
“Black Swan” worst case scenarios
Unexpected outcomes from future events
Example: Brexit
Simulate if it happened
An Interesting Technology Problem
8 •
Consistent
Non-linear
Volume
Speed
Risk Analysis: Everything has to be included = know when you are complete
Risk does not sum over hierarchies
Drill-down is non trivial
Traditional OLAP aggregate & incrementdoesn’t work10,000,000,000,000
Reactive nearreal-time calculations
Streaming dataFast corrections and “what-if”
Interactive sub-second queries on hugedata sets
Challenge No 1.
Find the seams
Break it up
Reusable components
Replace a piece at a time
9 •
Spaghetti
Challenge No 2.
10 •
Develop a new service
Integrate into the legacy system
Reconcile the output
Find and fix legacy bugs
Fight complification
Challenge No 3.
Batch is synchronous state transfer. The only way to achieve consistency?
11 •
Consistency is seriously hard to combine with streaming
Event sourced and streaming approach
More robust, scalable and faster, especially for recovery
Comes with a cost
Challenge No 4.
Legacy SQL was slow
12 •
Partitions and horizontally scales out across commodity hardware.
Tougher challenges on terabyte-scale hardware due to NUMA limitations. Some cubes already > 200gb and larger ones planned.
Replace with in-memory aggregation
Aggregate billions of scenarios in-memory and pre-compute total vectors over hierarchies (linear)
Non-linear measures computed lazily
Reactive and continuous queries
Solution: Microservices!Well almost…
Single responsibility – replace pieces of legacy from the inside out
Self contained with business functional boundaries• Independent and rapid development – team owns the whole stack• Organisationally scalable – horizontally scale your teams
Flexible and maintainable – evolve the architecture
Smart endpoints and dumb pipes
Innovation and short lifecycles
13 •
The problem• Business:
• Multi-model Market Risk calculator for Nordea portfolio• VaR on different organization levels with 5-6 different models in parallel
• IT:• 7000 CPU hours of grid calculation• More than 4000 SQL jobs
• Graph with more than 10000 edges• Nightly batch flow
14 •
How did it look like?
• Well, you know. 10 years of development
• In SQL
• No refactoring(who needs it?)
15 •
So what to do?
We all know the answer probably (since we are at this section )
- Find logically isolated blocks- Keep an eye on non-functional aspect- Think of how they communicate- Think about what happens if something dies
19 •
Not quite a “classical” microservices…or?
produce enrich aggregate
- Request/response is not feasible- Synchronous interaction is too long- Some results are expensive to reproduce
20 •
So we need…
A middleware which
- “Glues” services together- Caches important results- Serves as a coordinator and work distributor
21 •
store store store
Pub/sub messaging as notifier
Producer Enricher Aggregator
consumer
Redis pub/sub
24 •
But…
25 •
There are two main problems in distributed messaging:2) Guarantee that each message is only delivered once1) Guarantee messages order2) Guarantee that each message is only delivered once
Enricher
Redis pub/sub
Incoming queue
Processing queue
EnricherProducer
store
Queues with atomic operations
BRPOPLPUSH
26 •
Sets and Hmaps – all good for dedup
In eventually consistent world dedup is your best friend
store - HSET
Enricher
Multiple inserts due to recovery
Consistent state due to dedup
27 •
So how to scale out?
logically concurrently
Enricher <type A>Enricher <type B>
Enricher <type X>
Redis pub/sub
Aggregator <day 1>Aggregator <day 2>
Aggregator <day 3>
Steal workFilter my events
RedLock + TTL
28 •
Demo
store store store
Producer Enricher Aggregator
consumer
Redis pub/sub
Incoming queue
Processing queue
RedLock + TTL
29 •
The Result and What We LearnedSuccess!
• Aggregate and produce risk: 5 hours → 30 mins• Corrections: 40 mins → 1 second• Earlier deliveries – more time to manage the risks• Faster recovery from problems• Happy risk managers
Important (and painful) to integrate new services into the existing system
Consistency is hard to combine with streaming (subject of another talk maybe)
When distributing remember first law of distributed objects architecture(do you remember it?)
30 •
The Result and What We Learned
First Law of Distributed Object Design:
"don't distribute your objects"
31 •
And of course…
32 •
https://dk.linkedin.com/in/charles-tye-a8aa88b
https://github.com/parallelstream/