Myself
• Clément Béra
• 2011-2013: Engineer on the Pharo VM
• 2013-2017: PhD student
• Optimisations of the Pharo VM JIT compiler
0
2
4
6
8
10
12
14
16
Interpreter2005
Stack2009
CogV12010
CogV22011
Spur2014
Sistafuture
Binary tree benchmark
0
2
4
6
8
10
12
14
16
Interpreter2005
Stack2009
CogV12010
CogV22011
Spur2014
Sistafuture
Binary tree benchmarkPharo 5
2016
Plan
• Pharo 5 (stable)
• First time we out benched most competitors
• Pharo 6 (released next week ???)
• Pharo 7
Pharo 5: Spur
• Efficient scavenges
• In most applications, most GC time is now in scavenges
Code execution GC
Pharo 6: New compactorLoading a 200 Mb Moose Model in 250 Mb image
February April
Total time 2 min 1 min 2 sec
Time in Full GC 1 min 2 sec
Full GCavg pause 15 sec 0.5 sec
Time in scavenge 15 sec 15 sec
Pharo 6: New compactorLoading a 200 Mb Moose Model in 250 Mb image
February April
Total time 2 min 1 min 2 sec
Time in Full GC 1 min 2 sec
Full GCavg pause 15 sec 0.5 sec
Time in scavenge 15 sec 15 sec <- GC tuning gets
it down to 5 sec
Pharo 7: Incremental GC ??
• Full GC pauses: ~500 ms at ~500Mb
• Java default GC at 200ms soft real time
• Solution
• Incremental marking
• Incremental compaction
Code execution• Pharo 5:
• Spur got 1.8x
• Pharo 6:
• Polishing and micro-optimisations
• Pharo 7:
• Sista gets 1.5x-5x
Pharo 6
• Register allocation improvements
• Two path compilation
• Frameless code for setter-like methods
Sista: Pharo 7 ?• Program introspection
• Speculate on types based on previous runs
• Optimize frequently used code
• Deoptimize and reoptimize code incorrectly speculated
Program readability
1 to: array size do: [ :i | (array at: i) yourself ].
array do: [ :elem | elem yourself ].
array do: #yourself.
Program readability
1 to: array size do: [ :i | (array at: i) yourself ].
array do: [ :elem | elem yourself ].
array do: #yourself.
0 2 5 20
87M/sec
28M/sec
13M/sec
3.7M/sec
15M/sec
21M/sec
10M/sec
3.9M/sec
94M/sec
40M/sec
22M/sec
6.5M/sec
Performance
0 1 2 3 4 5 6
A*
ThreadRing
SpectralNorm
JSJSON
BinaryTree
DeltaBlue
Richards
TCAP
Kmeans
Sista
Pharo
Getting stable• Support most development workflow
• Support image recompilation
• Integration has started
In-image design
Smalltalk image
Virtual machine
CogitCompiledCode to native codeMachine-specific optimisations
ScorchCompiledCode to CompiledCodeSmalltalk-specific optimisations
CompiledCode(persisted across start-ups)
native functions(discarded on shut-down)
Baseline JITOptimising JIT
We are looking for…
• Use-cases showing what to improve
• Large real-world benchmarks
• Contributors
• Investment