Date post: | 15-Jul-2015 |
Category: |
Software |
Upload: | basware-belgium |
View: | 82 times |
Download: | 5 times |
Introducing a reactive
Scala-Akka based system
in a Java centric companyBasware Belgium NV
Jeroen Verellen ( )@jeroen_v_
Milan Aleksić ( )@milanaleksic
Basware Metrics system and dashboardA journey through Akka and spray covering:
actor developmenttestingspray routingacceptance testingbuildmicro benchmarkingand more...
AgendaBusiness caseRequirementsConceptsTechnologiesEvolution through commit logFuture changesQ&A
Business case
We want to have a real time dashboard that showsthe amount of documents coming in and going out
(per channel).
And also list the amount per document type. All ofthis should be made visible in a per hour view.
Requirements
Requirements
Basically, a lightweight replacement for data warehouse /reporting tool
Highly concurrent, non-blocking: no influence on other systemsthat generate the metrics
The aggregated metrics should be stored so that the system canrecover its state after a restart. The store should be simple.
Requirements
The system should run on Java 8 runtime like all our other (newer) components
Simple API that allows us to show metrics in dashing, since that is the dashboard technology of choice
Mockup
Some concepts we liked
and planned on using
Pre-Aggregate
Calculate a number of statisticswhile the metrics are coming in.
The system does not store rawdata but calculates e.g. howmany times a service was calledin the last hour.
But, we didn't really useMongoDB in our case, this wasjust a design pattern we liked
Event sourcingCapture all changes to an application state as a sequence of events
(Fowler).
Instead of storing the latest application state, the system stores theevent that changes the state. Upon a query for the application state,
the state is rebuild from the events.
Advantages:temporal query supportevent replay in case of bugscode changes / complete rebuild of statereverse / undo events
Difficulties:mind shiftinteraction with other non-event sourced aplications
CQRS
At its heart is the notion that you can use adifferent model to update information than the
model you use to read information (Martin Fowler)
Command Query Responsibility Segregation
System interactions arrive in the form of commands
some commands can be rejected (e.g. validation failure)successful commands are stored in the persistence layer
Another part of the system can receive non-rejectable events
replayed in order on the query object sideto avoid cost of replay of all events, we use snapshots
CQRS
It composes well withEventsourcing and ActorsModel
It is not an architecture, it's apattern!
Some technologies we liked
and planned on using
Many reasons why we like Scala, just to list some:
modern functional / OOP mixstable, but moving faster than Javatraits > interfaceschaining and composing Futurescase classespattern matching
All in all, more expressive in less code
Akka is a toolkit that promises scalability via Actors
each actor can have internal statewhich can be changed only via messageswhich are executed in ordersupervision strategies, event buses, remote actors, cluster...
Vertical and horizontal scalability using single paradigm
Cons:
paradigm switch, takes time to learna general concurrency pattern, but doesn't fit every usage case
Some of the good sides:
builds on Akka, Akka IOboth client side and serve side APIscase classes for requests, status codesdeclarative routing DSL
Spray library is going to be republished as akka‐http
Most of the things we done could (and should) be migrated toakka-http as this new library becomes GAImportant thing it currently can't do: WebSocketsDSL can cause compile issues in IDEAJSON support relies heavily on implicits, making it hard todebug
debug
Pure Java client
We (obviously) needed a way to push ("report") the metrics data to the server
What we came up with is simple:in-memory bounded queue of sending tasks
The client side needs to be lightweight and non-intrusive
we would also batch increments before sending them
we delegated implementation of this to Joeri, our colleague
ScalaTestFunctionalMany test styles
FlatSpec for unit testingFeatureSpec for AT
Many more availableWordSpec...
Typesafe Config
Configuration library for JVM languages.No dependenciesJava properties, JSON, and a human-friendly JSON supersetSupport for nesting
Evolution through commit log
Contract first: define JSON in/out of the API
Use simple .sbt file
commit dc597f5a959d844c1c98b459ce5db5192b3dfc9aDate: Mon Oct 20 16:47:04 2014 +0200project structure + schemas
commit 81956d778e1bafc0e0decd2a74dd7e8d0e9ce5cfDate: Mon Oct 20 17:15:36 2014 +0200allow to post multiple metrics
New libraries
%% notation of dependenciesTypesafe configSpray
spray-canspray-routingspray-json
commit 87d3f08716a29922a26df82ec1f23e552bd18ed1Date: Wed Oct 22 11:19:32 2014 +0200use base unit test class DRY
commit 1896d9d7229bbe5fd7a038c1c6661fe3c5ae8740Date: Wed Oct 22 11:04:39 2014 +0200first implementation of a route for handling metrics posts
Unit and acceptance testing
TDD from here onUse API example in test (test the documentation)Scala test, Spray-client, Akka-TestKitRe-use spray-json on client sideServer and client run in same Actor System
commit ddb2d5a8fd47b9042d906ca75c26eab9f0483d88Date: Wed Oct 22 17:51:32 2014 +0200add first acceptance test
commit 0ada7ce978f28ecd58d077e3f10785e0ad8e559cDate: Thu Oct 23 13:29:20 2014 +0200add test for invalid metrics
commit 657264839cdb948df508b0fefb955ef2a89422a2Date: Thu Oct 23 12:33:21 2014 +0200split of the AT tests, using example from API def in test
SBT setup remodeling
Script -> ScalaStolen from Sprayclearer separation modules, dependencies, build settings
commit b204e81c6c42837792e6292f4d6a494a5efa1cfaDate: Thu Oct 23 11:28:33 2014 +0200improve sbt setup, stole from spray setup
, , , JSON Add Metric SPRAY REST route JSON support Unit andAcceptance testing
Dashing
Re-use company standard fordashboards
Easier adoption in thecompanyRuby gem (Sinatra,Batman.js, CoffeeScript,SCSS... full hipster)Currently running on ourdashing server
It's Ruby but simple enough
Actor tree overview
CounterMetricsActor
CounterMetricsActorName(metric "outgoingAS2")
CounterMetricsIntervalActorYEAR = 2015
CounterMetricsIntervalActorMONTH = 11
CounterMetricsIntervalActorDAY = 11
CounterMetricsIntervalActorHOUR= 01
BUCKETS=[00..59]
CounterMetricsActorName(metric "incomingHTTP")
CounterMetricsActorName(metric "messagetypeinvoice")
CounterMetricsIntervalActorYEAR = 2014
CounterMetricsIntervalActorYEAR = 2015 CounterMetricsIntervalActor
YEAR = 2015
..................
.........
CounterMetricsIntervalActorMONTH = 10
CounterMetricsIntervalActorMONTH = 12
CounterMetricsIntervalActorDAY = 10
CounterMetricsIntervalActorHOUR= 02
BUCKETS=[00..59]
CounterMetricsIntervalActorHOUR= 03
BUCKETS=[00..59]
......... .........
.........
Shaping the Actor System
Documentation, testing"Add metrics" / "Report metrics" calls introduced
commit 182b3394db9c628a2ab26a62c6eeb351a360657cDate: Thu Oct 23 20:39:35 2014 +0200add post method for getting metric reports
Shaping the Actor System
We put a thin facade between the API and the actorsCase classes / model on InternalAPIDTO objects on ExternalAPI
we used Cmd and Query as suffixes for commands and queriesStarting from this commit Milan starts getting more involved withserver Scala side
commit 6326978260986f90dd8cb0796ef105cd945aded2Date: Mon Oct 27 12:43:32 2014 +0100Introducing internal API class. Replacing Request/Response case classes with command/view case classes. Using ScalaMock to test the api entry point
commit 1601c2de3a0d8b69fc683930d7f1e3235ba09604Date: Mon Oct 27 14:03:30 2014 +0100CRbased improvements
commit 3fc2363817c6b013a26feaef783835fc89dd48baDate: Mon Oct 27 15:40:48 2014 +0100making first Command & Query case classes in the place of incoming DTO objects
CR-1615 CR-1618
Shaping the Actor System, child instantiation & caching
First actor treeKeep your own reference cacheCreate childrenWatch childrenAct on "Terminated"
commit a82584263da9d0dae8a54bc1dd510a95bbc45790Date: Mon Oct 27 17:36:39 2014 +0100start with actor tree
commit ac3167bf9b9d491235e9a8c97ee89b24e20d92f3Date: Tue Oct 28 11:18:22 2014 +0100handle unknown messages, check cache is empty
commit c776d5fcf814372350f3d9d9121e2858342efe58Date: Tue Oct 28 13:11:01 2014 +0100add child factory with default impl
commit 43cbe21a418cda4d01d767cd2aeccbfc4b263107Date: Tue Oct 28 17:07:48 2014 +0100add support for query reports
Akka Persistence Design Overview
AkkaInteralApiimplementsInternalApi
CounterMetricsActor
(1) AddMetricsCmd / <no reply>
(1) ReportMetricsQuery(12) ReportMetricsQueryResult
CounterMetricsActorName(metric "bwincomingONP")
(2) RecordCounterMetric
(2) GetCounterMetricReport(11) CounterMetricReport
CounterMetricsIntervalActorYEAR = 2014
CounterMetricsIntervalActorMONTH = 11
..........
CounterMetricsIntervalActorDAY = 11
CounterMetricsIntervalActorHOUR= 01
BUCKETS=[00..59]
..........
..........
..........
(4) RecordCounterMetric
(5) RecordCounterMetric
(6) RecordCounterMetric
(3) RecordCounterMetric
(6) GetCounterMetricReport(7) CounterMetricReport
(5) GetCounterMetricReport(8) CounterMetricReport
(4) GetCounterMetricReport(9) CounterMetricReport
(3) GetCounterMetricReport(10) CounterMetricReport
.........
There is only one.It caches actor
children per metricname
Each HTTP request can containmultiple metric queries.
Each metric query makes a single"TimeScopeQuery" tree structure thatgets partially processed by adequate
children actors throughGetCounterMetricReport message.Result of all queries is processed inasync and gathered as a single HTTP
response
Both Actor and Interval actors haveInterval actor children (cached by theirvalue, eg. hour or minute number).These children actors can die if theyare not queried for long enough
period.Only if there is no more children can a
parent actor die.Actors can be revived to their previous
state via akkapersistence.
This actor sends a scheduledmessage to its children to
SNAPSHOT their current state(an optimization in CQRSsystems to make "reviving"
faster)
This actor is special because itkeeps minutes' informationinside "buckets" (a hashmap)making it our minimal possible
precision
Persistent actors, keepstate between restarts.
store events, delete old eventscreate snapshot, delete old snapshots
MetricsExternalApi
addMetricsgetReports
Exposes aREST/JSON interface
towards metricsclients
Journal Plugin (in memory)
Keeps track of the lastfew events, rest is
thrown away
Snapshot Plugin (file system)
Keep the lastsnapshot, throw away
older snapshots
Persistent framework.
Introducing Akka Persistence
Choosing pluginsJournal in memory, Snapshots on disk
PersistentActor becomesreceiveRecover: SnapshotOffer and/or Journal entryreceiveCommand: work and call persist
commit c67826ade4b8b32dd7b67cd5c05e06529b0cfedbDate: Tue Oct 28 10:40:05 2014 +0100adding akka persistence dependency
Introducing Akka Persistence
Initial commit where we tried to split actors into name- and time-based ones:
root actor in tree decides how to delegate based on name (ofthe metric)second level (and deeper) decide based on time
There was still lot of work to be donecommit 5dbff87119593cba7db6eef32b595135317bc17fDate: Tue Oct 28 16:00:56 2014 +0100time scope actors introduced
Run localhost
Main classEasier startup from IDELocal testing front endClient simulator
Easier local testingUsed for load testing later on
commit c907474d7846865ddc277550b3a6cdaaf36c4ee8Date: Wed Oct 29 09:02:52 2014 +0100have our own main: easier for standalone or IDE usage
commit 2954db34ae44b52d9b4087e7ba74118871ea1f70Date: Wed Oct 29 10:03:10 2014 +0100utilizing localhost from a running project, correcting URLs
commit 463499a493f575b3817313640014a1c45431da09Date: Wed Oct 29 10:23:36 2014 +0100add client simulator
Extend report/query functionality
Still waging war with the journal - too much loggingStabilizing number of "with"s we are using with common traitBaseMetricsActorPre-calculation of the "query tree" when query comes in
Delegation to children only when neededBucketScope vs RangedScope vs FullScopeStill to find better Scala-idiomatic way to do it
commit 3d92414b3c972b4b91c608287929e76a63e84a38Date: Fri Oct 31 11:52:04 2014 +0100query recording runs across year/month/date actors
... and many others
Packaging
First idea: Settled for: uber-jar
Akka micro kernel
commit 1c0c8cebd6cf00cd6c748bd9d1c5fddb4a0f7ab3Date: Mon Nov 3 13:21:24 2014 +0100allow assembly and dependencytree plugins
Akka Persistence Part II
System scheduler: trigger snapshot creation"Make snapshot" message sent from root actorEach actor in tree is able to fanout to its childrenEach actor sends PoisonPill to himself after making a snapshot
this decision will have performance consequencesKeep your own children's references
interesting bugfix by Jeroencommit 5ad1d4e36fcc46919c125950ea7399256635e989Date: Mon Nov 3 17:00:04 2014 +0000snapshots should work from this point on
commit 7d588f29fa6f222bd1fa54a82fde3e932106c22fDate: Tue Nov 4 12:52:08 2014 +0100Fix fanout of MakeSnapshotuse the cache to get the children since in testing the children are not registered
Commit 1 Commit 2
Akka Persistence and Acceptance testing
Random snapshot directory for Acceptance testingRemove snapshot directory after testExecute cleanup from SBT
commit 8b577a80dbb4c0a23281bde25673d1927a563365Date: Tue Nov 4 11:55:26 2014 +0000some randomization introduced into the system and snapshot directoryremoval moved to a SBT task
Fixing memory issues
Related to performance issueactor got killed on every snapshotrevival of actor is IO intensiveonce a minute peaks in VisualVM
Make sure actors die after 30 minutesRevive and restore state when needed
commit a8750fa7a3ac3557a4028f580b216a9383d32610Date: Wed Nov 5 12:00:26 2014 +0100Make sure actors die if not used for 30 minutes: constrain memory usage
Fixing storage issues
Snapshot was made even for actors with empty stateCheck if state is dirty before save snapshotOn SaveSnapshotSuccess
clean old snapshots based on metadatadelete journal entries
commit 2016179d8b43698444511ebdcbf3e81fadf8f140Date: Wed Nov 5 18:00:53 2014 +0100first go at cleanup of snapshots and journal entries
commit 2b8f101c04d0cb87b585c63e94cdb779d15f15ceDate: Thu Nov 6 08:57:25 2014 +0100improved snapshot cleanup
commit 7fdae48fbc5ca7f2be5a4ad349521b065b0fca81Date: Thu Nov 6 10:34:21 2014 +0100added test for snapshot cleanup
Actor supervision
Poison pill / suicide turned into massacre ;-)Stop exceptions from bubbling upStop actor in case of failureRevive and restore state in case needed
commit dea72b1fa0db5e05220496224c5e3c62db477c3cDate: Fri Nov 7 09:36:46 2014 +0100added supervisor strategy in actors, changed inmemory journal plugin, tweaking of journal message deletionNeed to add tests for all of this!!
commit a12aea367e847825aba64603ae1e872a7183aa65Date: Wed Nov 12 12:33:45 2014 +0100added test to make sure snapshotting only happens when state is dirty
commit 7c3175ece5be1560e3774d55ec478818b77ab3ffDate: Wed Nov 12 15:01:26 2014 +0100added test: try to simulate error just after the hour
JVM tuning
Use simple client simulator 100 metrics / secondMonitor metrics serverRun over lunch break / over nightLarge young generation for ScalaGC options for shorter pauses and better response times
production in on SolarisXms128MXmx512MXX:MaxMetaspaceSize=128mXX:NewSize=450MXX:+UseConcMarkSweepGCXX:+UseParNewGCXX:+PrintGCDetailsXX:+PrintGCTimeStampsverbose:gcserver
If we could choose how toimprove it...
Possible improvements on front-end
Replace the Dashing.io framework with D3 + Spray.io
the former is a powerful visualization library
the latter we already have for serving API
the idea: serve the static JS files which would call REST API
We didn't really think about MVVC framework, why not pure WebComponents + ES6?
Possible improvements on front-endUse WebSockets for real time information
Dashing.io already uses SSE, but the server side has occasionalhiccups
Possible improvements on back-end
Introduce a bit more serious snapshot plugin
(Snapshot plugins)
MongoDB, PostgreSQL...
This was hurting our eyes from the start but the funny things...it works even as is
http://akka.io/community/
Possible improvements on back-endUnify actor implementation
we should be using buckets everywhere
Current state
sealed trait ActorCache[T] { val nameRefMap: mutable.Map[T, ActorRef]}trait MetricActorCache extends ActorCache[Metric] {...} // root onetrait TimeScopeActorCache extends ActorCache[TimeScope] {...} // keeping per hour / minuteclass CounterMetricsIntervalActor(val metric: Metric, val scope: TimeScope) { // lowest level private var buckets: mutable.Map[Int, Int]}
Possible improvements on back-end
Randomization of children sleeping pill
at this time our GC spikes on round hours
this would allow the pressure on the snapshot store to breath abit
it would also give us extra stability since sometimes incomingmessages get lost
If we had even more time...
Clustering
how scalable would it be?what would be the throughput?how would we handle node fail?
Functional improvements
cohort analysisdynamic time selection
Akka Persistence
use persistent view instead of persistent actorwe only care about the queryoriginal commands are thrown away
Q&AThank you!