Date post: | 29-Nov-2014 |
Category: |
Engineering |
Upload: | alpinedatalabs |
View: | 759 times |
Download: | 3 times |
Integrating Non-Reactive Legacy Code - The Case of R !
!!!!!Marek Kolodziej Machine Learning Engineer !
!
!
!!!!SF Scala Meetup, Sep. 10, 2014
Reactive Recap
Event-‐driven!-‐ Asynchronous -‐ Non-‐blocking -‐ Op4mized around Amdahl’s Law
Scalable-‐ Loca4on transparency (up and out)
-‐ Factor in unreliable network !
Resilient-‐ Failure isola4on (bulkhead paAern, etc.)
-‐ Clean service and failure handling separa4on (supervision)
Responsive-‐ Minimize latency -‐ Deal with bursty traffic -‐ Gracefully handle conges4on (backpressure/ac4ve pull by subscriber)
Reactive Recap
07
< <
Not everything’s an actor-‐ Legacy Java/Scala code -‐ Third-‐Party Libraries
Blocking calls!-‐ Database queries -‐ Calls to services -‐ Non-‐threaded run4mes (R) !!
Long-‐running jobs!-‐ Resource clean-‐up in case network par44on occurs way before the 4me-‐out is reached
-‐ Timeouts vs. heartbeats !
Not all failures are within the JVM!-‐ Can we revive them from within the JVM?
!!
The tough realityNot everything’s under your control
07
< <
Alpine’s R Operator
07
< <
For
!!!!!!!!!!!!!
-‐ 5,000+ sta4s4cal and machine learning libraries
-‐ “[Numeric] gold standard” implementa4ons
-‐ Operator would allow arbitrary processing in a “canned” applica4on
-‐ Data scien4sts already know the language
-‐ Support for client’s exis4ng code base (100s of scripts)
-‐ Very rapid prototyping -‐ Focus on science instead of coding !
Alpine’s R OperatorThe cases for and against R
07
< <
Against
!!!!!!!!!!!!!!
-‐ Slow run4me (even with JIT) -‐ Memory hogging (by-‐copy seman4cs)
-‐ Very slow garbage collec4on -‐ Single-‐threaded run4me (even worse than Python and Ruby) -‐ Na4ve libraries wriAen by people without much CS/engineering background (segfaults, etc.)
-‐ Buggy libraries (infinite loops, etc.)
-‐ Run4me crashes -‐ Terrible handling of big datasets
Licensing Issues!-‐ R is GPL -‐ RServe is (L)GPL -‐ Shipped soaware (GPL SaaS loophole doesn’t apply)
Distributed compuHng
!!
!!!!!!!
-‐ Need a cluster of R workers (mul4-‐user, mul4-‐operator concurrency given a single-‐ threaded R run4me) !-‐ REST is good for data but preAy bad for control (some structure would be nice) !
-‐ Sessions or backpressure !!!
Challenges
07
< <
Fault tolerance!-‐ R run4me failures -‐ Network par44ons (R session clean-‐up) !
!
Licensing Issues
!!!!
-‐ Akka is Apache 2.0 -‐ RServe is (L)GPL -‐ Can open-‐source the R-‐Java server bridge
-‐ Communica4on to Alpine backend via (open-‐source) message case classes
Distributed compuHng
!!!!!!!!!!!!!
-‐ Akka’s loca4on transparency is ideal for distribu4ng work
-‐ Cluster API would have been preferred but Alpine uses Akka 2.2.3 due to Spark dependency
-‐ Structure and seman4cs due to message case classes
-‐ Rx streams would have been nice for backpressure, but we have an old Akka version (so sessions)
!
Solutions
07
< <
Fault tolerance
!!!!!!!!!!!!!!!!
-‐ Rserve forks R processes. Exc. handling of the Connec4on object lets you restart processes.
-‐ Akka’s heartbeat allows session clean-‐up in case of network failure before 4me-‐out (important if 4me-‐out is ~1 day).
-‐ Event bus lets you observe failure to connect to remote actor system.
-‐ No need for exactly once seman4cs (the user can re-‐run the flow), but you have to know that the failure occurred. !!
!
Sessions
!!!!!
!!!-‐ Arguably the ugliest part of the solu4on (can be replaced with alterna4ves) -‐ Worker actors blocked for long periods (hours). -‐ Large data blocks are sent to the Akka R server (~ 128 MB). -‐ No backpressure via Rx streams since it’s Akka 2.3.2. -‐ Custom router -‐ refuses requests if all workers are busy. -‐ Client needs to respond to request refusal by awai4ng a free worker message (reac4ve but inelegant). -‐ BeAer solu4on -‐ use reac4ve streams (we need to upgrade Akka) -‐ Improvement: use Akka for control but REST for data movement !!!!!!
!!!!!!
Solutions
07
< <
-‐ Data movement via REST !
-‐ Replacement of sessions via reac4ve streams (Akka upgrade!) !
-‐ Kamon test drive for distributed actors (released ~2 weeks ago) !!!!
Future Improvements
07
< <
!!!!!
!!!-‐ Akka makes even non-‐reac4ve distributed programming easier and more reliable !-‐ If you can, use the latest Akka version because a lot of the earlier pain can be avoided: -‐ clustering -‐ persistence -‐ reac4ve streams !-‐ Large data movement via Akka is probably not an ideal use of the framework: -‐ use REST (including Spray, Play, etc.) and HTTP chunking -‐ move the data directly using NeAy, etc. !!
!!!!!!
Conclusions
07
< <
Thank You !!!
07
< <
!!!!!
!!!
-‐ Alpine is hiring -‐ machine learning engineers (Scala/Java) -‐ data scien4sts (R/Python) -‐ Front end developers (Ruby on Rails) !
-‐ ScalaCourses.com is looking for reviewers: -‐ Scala (beginner/intermediate) -‐ Akka -‐ Play -‐ Java Interop. -‐ contact Michael Slinn: [email protected] !!
!!!!!
Miscellaneous
07
< <