+ All Categories
Home > Technology > Back to the futures, actors and pipes: using Akka for large-scale data migration

Back to the futures, actors and pipes: using Akka for large-scale data migration

Date post: 01-Dec-2014
Category:
Upload: manuel-bernhardt
View: 368 times
Download: 1 times
Share this document with a friend
Description:
Slides of my talk at Scala.io 2014 about large-scale data migration with Akka.
83
@ELMANU manuel BERNHART BACK < & future : ACTORS AND > PIPES < using akka for large-scale data migration
Transcript
Page 1: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

manuel BERNHART

BACK <

& future: ACTORS

AND > PIPES <using akka for large-scale data

migration

Page 2: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

AGENDA

• { BACKGROUND STORY

• } FUTURES > PIPES < ACTORS

• | LESSONS LEARNED

Page 3: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

who is speaking?

• freelance software consultant based in Vienna

• Vienna Scala User Group

• web, web, web

• writing a book on reactive web-applications

Page 4: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

[ { BACKGROUND

STORY

Page 5: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

talenthouse

• www.talenthouse.com

• based in Los Angeles

• connecting brands and artists

• 3+ million users

Page 6: Back to the futures, actors and pipes: using Akka for large-scale data migration
Page 7: Back to the futures, actors and pipes: using Akka for large-scale data migration
Page 8: Back to the futures, actors and pipes: using Akka for large-scale data migration
Page 9: Back to the futures, actors and pipes: using Akka for large-scale data migration
Page 10: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

BACKGROUND STORY

• old, slow (very slow) platform

• re-implementation from scratch with Scala & Play

• tight schedule, a lot of data to migrate

Page 11: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 12: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

DISCLAIMER:

What follows is not intended as a bashing of the source system, but as a

necessary explanation of its complexity in relation to data migration.

Page 13: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 14: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 15: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 16: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 17: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 18: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 19: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 20: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SOURCE SYSTEM

Page 21: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

MIGRATION schedule

•basically, one week-end

•big-bang kind-of migration

• if possible incremental migration beforehand

Page 22: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

[ } FUTURES > PIPES

< ACTORS

Page 23: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES

Page 24: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES

• scala.concurrent.Future[T]

•holds a value of type T

• can either fail or succeed

Page 25: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES: HAPPY PATH

import scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.global

val futureSum: Future[Int] = Future { 1 + 1 }

futureSum.map { sum =>println("The sum is " + sum)

}

Page 26: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES: SAD PATHimport scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._

val futureDiv: Future[Int] = Future { 1 / 0 }

val futurePrint: Future[Unit] = futureDiv.map { div =>println("The division result is " + div)

}

Await.result(futurePrint, 1 second)

Page 27: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES: SAD PATHimport scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._

val futureDiv: Future[Int] = Future { 1 / 0 }

val futurePrint: Future[Unit] = futureDiv.map { div =>println("The division result is " + div)

}

Await.result(futurePrint, 1 second)

Avoid blocking if possible

Page 28: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES: SAD PATHimport scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._

val futureDiv: Future[Int] = Future { 1 / 0 }

futureDiv.map { div =>println("The division result is " + div)

}

Await.result(futureDiv, 1 second)

scala>  Await.result(futureDiv,  1.second)  java.lang.ArithmeticException:  /  by  zero      at  $anonfun$1.apply$mcI$sp(<console>:11)      at  $anonfun$1.apply(<console>:11)      at  $anonfun$1.apply(<console>:11)      at  scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)      at  scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)      at  scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)      at  scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)      at  scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)      at  scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)      at  scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Page 29: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES: SAD PATH

import scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._

val futureDiv: Future[Int] = Future { 1 / 0 }

val futurePrint: Future[Unit] = futureDiv.map { div =>println("The division result is " + div)

}.recover {case a: java.lang.ArithmeticException =>

println("What on earth are you trying to do?")}

Await.result(futurePrint, 1 second) Be mindful of failure

Page 30: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES: SAD PATH

•Exceptions are propagated up the chain

•Without recover there is no guarantee that failure will ever get noticed!

Page 31: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

COMPOSING FUTURES

val futureA: Future[Int] = Future { 1 + 1 }val futureB: Future[Int] = Future { 2 + 2 }

val futureC: Future[Int] = for {a <- futureAb <- futureB

} yield {a + b

}

Page 32: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

COMPOSING FUTURES

val futureC: Future[Int] = for {a <- Future { 1 + 1 }b <- Future { 2 + 2 }

} yield {a + b

}

Page 33: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

COMPOSING FUTURES

val futureC: Future[Int] = for {a <- Future { 1 + 1 }b <- Future { 2 + 2 }

} yield {a + b

}

This runs in sequence

Don’t do this

Page 34: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

FUTURES: CALLBACKS

import scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.global

val futureDiv: Future[Int] = Future { 1 / 0 }

futureDiv.onSuccess { case result =>println("Result: " + result)

}

futureDiv.onFailure { case t: Throwable =>println("Oh no!")

}

Page 35: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

using FUTURES

•a Future { … } block that doesn’t do any I/O is code smell

•use them in combination with the “right” ExecutionContext set-up

•when you have blocking operations, wrap them into a blocking block

Page 36: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

using FUTURES

import scala.concurrent.blocking

Future { blocking { DB.withConnection { implicit connection => val query = SQL("select * from bar") query() } }}

Page 37: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

naming FUTURES

Page 38: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

naming FUTURES

“Say

eventuallyMaybe one more time!”

Page 39: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

ACTORS

Page 40: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

ACTORS

• lightweight objects

• send and receive messages (mailbox)

• can have children (supervision)

Page 41: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

ACTORSMailboxMailbox

akka://application/user/georgePeppard akka://application/user/audreyHepburn

akka://application/user/audreyHepburn/cat

Mailbox

Page 42: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

ACTORS

Holly, I'm in love with you.MailboxMailbox

akka://application/user/georgePeppard akka://application/user/audreyHepburn

akka://application/user/audreyHepburn/cat

Page 43: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

ACTORS

Holly, I'm in love with you.MailboxMailbox

akka://application/user/georgePeppard akka://application/user/audreyHepburn

akka://application/user/audreyHepburn/cat

So what?

Page 44: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

GETTING AN ACTOR

import akka.actor._

class AudreyHepburn extends Actor {def receive = { ... }

}

val system: ActorSystem = ActorSystem()

val audrey: ActorRef = system.actorOf(Props[AudreyHepburn])

Page 45: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SENDING AND RECEIVING MESSAGES

case class Script(text: String)

class AudreyHepburn extends Actor {def receive = {

case Script(text) => read(text)

}}

Page 46: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SENDING AND RECEIVING MESSAGES

case class Script(text: String)

class AudreyHepburn extends Actor {def receive = {

case Script(text) => read(text)

}}

audrey ! Script(breakfastAtTiffany)

Page 47: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SENDING AND RECEIVING MESSAGES

case class Script(text: String)

class AudreyHepburn extends Actor {def receive = {

case Script(text) => read(text)

}}

audrey ! Script(breakfastAtTiffany)

“tell” - fire-forget

Page 48: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

ASK PATTERN

import akka.pattern.askimport scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._

implicit val timeout = akka.util.Timeout(1 second)

val maybeAnswer: Future[String] = audrey ? "Where should we have breakfast?"

Page 49: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

ASK PATTERN

import akka.pattern.askimport scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._

implicit val timeout = akka.util.Timeout(1 second)

val maybeAnswer: Future[String] = audrey ? "Where should we have breakfast?"

“ask”

Page 50: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SUPERVISION

class UserMigrator extends Actor {

lazy val workers: ActorRef = context .actorOf[UserMigrationWorker] .withRouter(RoundRobinRouter(nrOfInstances = 100))

}

Page 51: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SUPERVISION

class UserMigrator extends Actor {

lazy val workers: ActorRef = context .actorOf[UserMigrationWorker] .withRouter(RoundRobinRouter(nrOfInstances = 100))

}

actor context

many childrenrouter type

Page 52: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SUPERVISION

Page 53: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SUPERVISION

class UserMigrator extends Actor {

lazy val workers: ActorRef = context .actorOf[UserMigrationWorker]

.withRouter(RoundRobinRouter(nrOfInstances = 100))

override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy(maxNrOfRetries = 3) { case t: Throwable => log.error(“A child died!”, t) Restart }}

Page 54: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

PIPES

Page 55: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

CECI EST UNE PIPE

•Akka pattern to combine Futures and Actors

•Sends the result of a Future to an Actor

•Be careful with error handling

Page 56: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

CECI EST UNE PIPEclass FileFetcher extends Actor {

def receive = { case FetchFile(url) => val originalSender = sender() val download: Future[DownloadedFile] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }

import akka.pattern.pipe download pipeTo originalSender }}

Page 57: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

CECI EST UNE PIPEclass FileFetcher extends Actor {

def receive = { case FetchFile(url) => val originalSender = sender() val download: Future[DownloadedFile] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }

import akka.pattern.pipe download pipeTo originalSender }} This is how you pipe

Page 58: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

CECI EST UNE PIPEclass FileFetcher extends Actor {

def receive = { case FetchFile(url) => val originalSender = sender() val download: Future[DownloadedFile] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }

import akka.pattern.pipe download pipeTo originalSender }}

Keep reference to original sender - what follows is a Future!

Page 59: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

CECI EST UNE PIPEclass FileFetcher extends Actor {

def receive = { case FetchFile(url) => val originalSender = sender() val download: Future[DownloadedFile] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }

import akka.pattern.pipe download pipeTo originalSender }}

Wrap your result into something you can easily match against

Page 60: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

CECI EST UNE PIPEclass FileFetcher extends Actor {

def receive = { case FetchFile(url) => val originalSender = sender val download: Future[Array[Byte]] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }

import akka.pattern.pipe download pipeTo originalSender }}

Will this work?

Page 61: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

PIPES AND error handling

class FileFetcher extends Actor {

def receive = { case FetchFile(url) => val originalSender = sender() val download = WS.url(url).get().map { response => DownloadedFile(...) } recover { case t: Throwable => DownloadFileFailure(url, t) }

import akka.pattern.pipe

download pipeTo originalSender }}

Don’t forget to recover!

Page 62: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

SUMMARY

• Futures: manipulate and combine asynchronous operation results

•Actors: organise complex asynchronous flows, deal with failure via supervision

•Pipes: deal with results of asynchronous computation inside of actors

Page 63: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

[ | LESSONS LEARNED

Page 64: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

design according to YOUR DATA

User migrator

Worker Worker Worker Worker Worker

Page 65: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

design according to YOUR DATA

Item migrator

User item migrator

Item migration

worker

Item migration

worker

User item migrator

Item migration

worker

Item migration

worker

User item migrator

Item migration

worker

Item migration

worker

design A

Page 66: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

design according to YOUR DATA

Item migrator

User item migrator

Item migration

worker

Item migration

worker

User item migrator

Item migration

worker

Item migration

worker

User item migrator

Item migration

worker

Item migration

worker

design A

Not all users have the same amount of items

Page 67: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

design according to YOUR DATA

Item m

igrator

Item migration

worker

User item migrator

User item migrator

User item migrator

Item migration

worker

Item migration

worker

Item migration

worker

Item migration

worker

Item migration

worker

File fetcher

File fetcher

File uploader

Soundcloud worker

File uploader

design B

Page 68: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

design according to YOUR DATA

Item migration

worker

User item migrator

User item migrator

Item migration

worker

Item migration

worker

Item migration

worker

Item migration

worker

Item migration

worker

File fetcher

File fetcher

File uploader

Soundcloud worker

File uploader

Pools of actors

design B

Item m

igrator

User item migrator

Page 69: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

KNOW THE limits OF THY SOURCE SYSTEM

Page 70: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

KNOW THE limits OF THY SOURCE SYSTEM

Page 71: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

DATA MIGRATION SHOULD not BE A RACE

•Your goal is to get the data, not to be as fast as possible

•Be gentle to the legacy system(s)

Page 72: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

CLOUD API STANDARDS

• ISO-28601 Data formats in REST APIs

• ISO-28700 Response times and failure communication of REST APIs

• ISO-28701 Rate limits in REST APIs and HTTP error codes

Page 73: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

CLOUD API STANDARDS

• ISO-28601 Data formats in REST APIs

• ISO-28700 Response times and failure communication of REST APIs

• ISO-28701 Rate limits in REST APIs and HTTP error codesDREAM ON

Page 74: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

NO STANDARDS!

• The cloud is heterogenous

•Response times, rate limits, error codes all different

•Don’t even try to treat all systems the same

Page 75: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

RATE limits

Page 76: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

RATE limits

•Read the docs - most cloud API docs will warn you about them

•Design your actor system so that you can queue if necessary

•Keep track of migration status

Page 77: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

RATE limits

•Example: Soundcloud API

•500 Internal Server Error after seemingly random amount of requests

Page 78: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

RATE limits

•Example: Soundcloud API

•500 Internal Server Error after seemingly random amount of requests

WS .url("http://api.soundcloud.com/resolve.json") .withHeaders("User-Agent" -> “FOOBAR”) // the magic ingredient that // opens the door to Soundcloud

Magic User-Agent

Page 79: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

BLOCKING

Page 80: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

seriously, do not BLOCK

•Seems innocent at first to block from time to time

•OutOfMemory after 8 hours of migration run is not very funny

•You will end up rewriting your whole code to be async anyway

Page 81: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

MISC

•Unstable primary IDs in source system

•Build a lot of small tools, be pragmatic

• sbt-tasks (http://yobriefca.se/sbt-tasks/)

Page 82: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

THE END

Page 83: Back to the futures, actors and pipes: using Akka for large-scale data migration

@ELMANU

THE END

QUESTIONS?


Recommended