pICKLES&sPORES
HEATHER MILLER
On
IMPROVING SCALA’S SUPPORT FOR DISTRIBUTED PROGRAMMING
with:PHILIPP HALLER
EUGENE BURMAKO
MARTIN ODERSKY
TYPESAFE
EPFL
EPFL/TYPESAFE
wHAT IS THIS TALK ABOUT?
Making distributed programming
easier in scala
This kind of distributed system
This kind of distributed system
insert social network of your choice
here
ALso!
Bottomline:Machines CommunicatingHow can we simplify distribution at the language-level?
sCALA pICKLINGAgenda
Spores
What is it?PICKLING == SERIALIZATION == MARSHALLING
very different from java serialization
https://github.com/scala/pickling
Closed!Slow!wait, why do we care?
not serializable exceptions at runtime
can’t retroactively make
classes serializable
https://github.com/scala/pickling
Enter: Scala Picklingfast: Serialization code generated at compile-
time and inlined at the use-site.
Flexible:Using typeclass pattern, retroactively make types serializable
NO BOILERPLATE:Typeclass instances generated at compile-time
pluggable formats:Effortlessly change format of serialized data: binary, JSON, invent your own!
typesafe:Picklers are type-specialized. Catch errors at compile-time! https://github.com/scala/pickling
What does it look like?scala> import scala.pickling._ import scala.pickling._
https://github.com/scala/pickling
!scala> import json._ import json._ !scala> case class Person(name: String, age: Int) defined class Person !scala> Person("John Oliver", 36) res0: Person = Person(John Oliver,36)
scala> res0.pickle res1: scala.pickling.json.JSONPickle = JSONPickle({ "tpe": "Person", "name": "John Oliver", "age": 36 })
collections: Time Benchmarks
Benchmarkscollections: free Memory
(more is better)
Benchmarkscollections: size
Benchmarksgeotrellis: time
Benchmarksevactor: time
Java runs out of memory
Benchmarksevactor: time (no java, more events)
that’s just thebtw,
default behavior...
you can really customize scala pickling too.
https://github.com/scala/pickling
Previous examples used default behavior
Customizing Pickling
Pickling is very customizable
Before we can show these things,let's have a look at the building block of the framework...
Generated picklers Standard pickle format
Custom picklers for specific types Custom pickle format
What about subclasses?Wait,
https://github.com/scala/pickling
abstract class Person { def name: String } case class Firefighter(name: String, since: Int) extends Person case class Doctor(name: String, since: Int) extends Person !class Position(title: String, person: Person)
Scala is object-oriented too, remember?
To generate a pickler for Position, combine picklers for String and Person.
but person could dynamically be a
firefighter or doctor!
Goal: modular composition of picklers !
Pickling is very customizable
A `Pickler[Person]` is not good enough, it can only pickle instances whose dynamic type is `Person`. !
Generated picklers Standard pickle format
Custom picklers for specific types Custom pickle format
Subtyping
Pickler Combinators
trait Pickler[T] { // returns next write position def pickle(arr: Array[Byte], i: Int, x: T): Int // returns result plus next read position def unpickle(arr: Array[Byte], i: Int): (T, Int) }
Elegant programming pearl that comes from functional programming.
A composable and "constructive" way to think about persisting data.
Compose picklers for simple types to build picklers for more complicated types
What is a pickler? simplified version of what’s actually used in scala- pickling
https://github.com/scala/pickling
https://github.com/scala/pickling
Pickler CombinatorsWe need 2 things:
fully-implemented picklers for some basic types like primitives1 Picklers for base types
Functions that combine existing picklers to build compound picklers2
example: combinator that takes a Pickler[T] and returns a Pickler[List[T]]
Pickler CombinatorsBuild a pickler for pairs (Int, String), combine an Int pickler and a String pickler
val myPairPickler = tuple2Pickler(intPickler, stringPickler)
What’s the type?
Can we combine them automatically?
Pickler[T], can pickle objects of type T
def pickle(implicit pickler: Pickler[(Int, String)]) = { pickler.pickle((32, “yay!”)) }
Goal:
Can take intPickler and stringPickler as implicit parameterstuple2Pickler can be an implicit def
https://github.com/scala/pickling
Implicit Picklers
case class Person(name: String, age: Int, salary: Int) !class CustomPersonPickler(implicit val format: PickleFormat) extends SPickler[Person] { def pickle(picklee: Person, builder: PBuilder): Unit = { builder.beginEntry(picklee) builder.putField("name", b => b.hintTag(FastTypeTag.ScalaString).beginEntry(picklee.name).endEntry()) builder.putField("age", b => b.hintTag(FastTypeTag.Int).beginEntry(picklee.age).endEntry()) builder.endEntry() } } !implicit def genCustomPersonPickler(implicit format: PickleFormat) = new CustomPersonPickler
customize what you pickle!
https://github.com/scala/pickling
Pickle Formatoutput any format!
trait PickleFormat { type PickleType <: Pickle def createBuilder(): PBuilder def createReader(pickle: PickleType, mirror: Mirror): PReader } ! trait PBuilder extends Hintable { def beginEntry(picklee: Any): this.type def putField(name: String, pickler: this.type => Unit): this.type def endEntry(): Unit def beginCollection(length: Int): this.type def putElement(pickler: this.type => Unit): this.type def endCollection(length: Int): Unit def result(): Pickle }
https://github.com/scala/pickling
Pickle Format
https://gist.github.com/heathermiller/5760171
example
Output edn, Clojure’s data transfer format.
talk to a clojure app
toy builder implementation:
scala> import scala.pickling._ import scala.pickling._ !scala> import edn._ import edn._ !scala> case class Person(name: String, kidsAges: Array[Int]) defined class Person !scala> val joe = Person("Joe", Array(3, 4, 13)) joe: Person = Person(Joe,[I@3d925789) !scala> joe.pickle.value res0: String = #pickling/Person { :name "Joe" :kidsAges [3, 4, 13] }
What can be pickled?if there is an implicit of type Pickler[Foo] in scope, you can pickle instances of it
types for which an implicit pickler is in scope
types for which our framework can generate picklers
classes case classes generic classes singleton objects primitives & primitive arrays ...
IMPORTANT: The implicit picklers are used in the generation!
can’t (yet): instances of inner classes, Types https://github.com/scala/pickling
Status
Scala 2.11 as targetGoal:Plan:
1.0 release within the next few months
SIP for Scala 2.11
Integration with sbt, Spark, and Akka, ...
Experiment: use Scala-pickling to speed up Scala compiler
Release 0.8.0 for Scala 2.10.2
No support for inner classes, yet
ScalaCheck tests
Very soon: support for cyclic object graphs (for release 0.9.0)
https://github.com/scala/pickling
And now onto something completely different.
spores!scala
http://docs.scala-lang.org/sips/pending/spores.html
butwecan’treallydistributethem.
Closures are great.
we can’t really distribute themBut,
WHY?
because they capture stuff that’s not serializable.oFTEN NOT SERIALIZABLE
Easy to reference something and unknowingly capture it. ACCIDENTAL CAPTURE.
...enclosing
instead of compile-time checks.runtime errors
for a user, often unclear whether it’s a user-error or the framework
Who’s fault is it?
we can’t really distribute themBut,
Consequences that follow from
these problems...
!...not just in their public APIs, but private ones too.
framework builders avoid them
Users shoot themselves in the foot and blame framework.
rightfully so.
When picking battles, framework designers tend to avoid issues with closures.
eNTER:
spores
wHAT ARE THEY?
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
behavior
small units of possibly mobile
functional
Spores
what are they?A closure-like abstraction for use in distributed or concurrent environments.
goal:Well-behaved closures with controlled environments that can avoid various hazards.
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
Potential hazards when using closures incorrectly: • memory leaks • race conditions due to capturing mutable references • runtime serialization errors due to unintended
capture of references
sparkmotivating example:
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
class MyCoolRddApp { val param = 3.14 val log = new Log(...) ... def work(rdd: RDD[Int]) { rdd.map(x => x + param) .reduce(...) } }
Problem:
not serializable because it captures this of type MyCoolRddApp which is itself not serializable
(x => x + param)
Akka/futuresmotivating example:
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
def receive = { case Request(data) => future { val result = transform(data) sender ! Response(result) } }
Problem: Akka actor spawns future to concurrently process incoming results
akka actor spawns a
future to concurrently
process incoming reqs
not a stable value! it’s a method call!
Serializationmotivating example:
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
case class Helper(name: String) !class Main { val helper = Helper("the helper") ! val fun: Int => Unit = (x: Int) => { val result = x + " " + helper.toString println("The result is: " + result) } }
Problem:fun not serializable. Accidentally captures this since helper.toString is really this.helper.toString, and Main (the type of this) is not serializable.
We need safer closuresOk. Got it.
for concurrent &distributed scenarios.sure.
what do sporeslook like?
What do spores look like?
Basic usage:val s = spore { val h = helper (x: Int) => { val result = x + " " + h.toString println("The result is: " + result) } }
THE BODY OF A SPORE CONSISTS OF 2 PARTS
2 a closure
a sequence of local value (val) declarations only (the “spore header”), and1
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
Spore
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
1. All captured variables are declared in the spore header, or using capture
2. The initializers of captured variables are executed once, upon creation of the spore
3. References to captured variables do not change during the spore’s execution
vsclosures( )A Guarantees...
Spores&
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
closures
Evaluation semantics:Remove the spore marker, and the code behaves as before
spores & closures are related:
You can write a full function literal and pass it to something that expects a spore. (Of course, only if the function literal satisfies the spore rules.)
How can you use a spore?
In APIs
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
def sendOverWire(s: Spore[Int, Int]): Unit = ... // ... sendOverWire((x: Int) => x * x -‐ 2)
If you want parameters to be spores, then you can write it this way
How can you use a spore?
for-comprehensions
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
def lookup(i: Int): DCollection[Int] = ... val indices: DCollection[Int] = ... !for { i <-‐ indices j <-‐ lookup(i) } yield j + capture(i) !trait DCollection[A] { def map[B](sp: Spore[A, B]): DCollection[B] def flatMap[B](sp: Spore[A, DCollection[B]]): DCollection[B] }
get you?
what doesall of that
what does all of that
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
get you?Since...
Captured expressions are evaluated upon spore creation.
!
Spores are like function values with an immutable environment.
Plus, environment is specified and checked, no accidental capturing.
That means...
what does all of that
http://docs.scala-lang.org/sips/pending/spores.htmlProposed for inclusion in Scala 2.11
get you?or, graphically...
During execution
Right after creation
Spores closures
1
2
5 ‘a’
5 ‘a’
??
I’m in ur stuff
draggin around ur object graf
StatusSIP-21 Spores: on docs.scala-lang.org/sips now!
Get involved in the discussion!
Pull request for Scala 2.11 and Akka 2.2.1 in preparation
Integration with Scala-pickling planned
http://docs.scala-lang.org/sips/pending/spores.html