Date post: | 12-Jul-2015 |
Category: |
Software |
Upload: | marcus-lagergren |
View: | 1,064 times |
Download: | 0 times |
Design'Ra*onales'in'the'JRockit'JVM'Marcus'Lagergren'
Java'Language'team,'Oracle'
Design'Ra*onales'in'the'JRockit'JVM'Marcus'Lagergren'
Java'Language'team,'Oracle'
Design'Ra*onales'in'the'JRockit'JVM'Marcus'Lagergren'
Java'Language'team,'Oracle'
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle�s products remains at the sole discretion of Oracle.
Safe'Harbor'Statement'
Agenda'• In'the'beginning…'• What'did'we'accomplish'/'Internals'– Code'Genera*on'– Memory'Management'– Threads'&'Synchroniza*on'
• Externals'– The'Java'Mission'Control'suite'– A'parenthesis'on'JRockit'VE'
• Q&A'
About'the'speaker'
@lagergren'
About'the'speaker'
@lagergren'
Buy'this'book!'
About'the'speaker'
• M.'Sc.'from'KTH,'Stockholm'– Narrowly'escaped'doing'a'PhD'on'bit'security'in'cryptographic'systems'
• Run*me,'OS'and'compiler'engineer'since'1999,'with'some'startup'breaks'
• One'of'the'original'creators'of'the'JRockit'JVM'
In'the'beginning'
Appeal'Virtual'Machines'
• Appeal'SoYware'Solu*ons'– Consul*ng,'almost'exclusively'Java'by'1997'
• S*ll'the'pre[app'server'era'
Appeal'Virtual'Machines'• We'saw'that'Java'would'be'great'on'the'server'side'
Appeal'Virtual'Machines'• We'saw'that'Java'would'be'great'on'the'server'side'– Shorter'development'cycles'–'money'in'the'bank'• Buffer'overrun'protec*on'• Automa*c'memory'management ''• Write'once'run'everywhere'
Appeal'Virtual'Machines'• We'saw'that'Java'would'be'great'on'the'server'side'– Shorter'development'cycles'–'money'in'the'bank'• Buffer'overrun'protec*on'• Automa*c'memory'management ''• Write'once'run'everywhere'
• Tremendous'scalability'problems'• Sun'ClassicVM'was'all[encompassing'
JavaOne'1997'
• Sun'Microsystems'presents'the'HotSpot'virtual'machine'
JavaOne'1997'
• Sun'Microsystems'presents'the'HotSpot'virtual'machine'– “WOW!'This'is'the'way'to'do'it!'Adap*ve'run*mes!”'
JavaOne'1998'
• Sun'Microsystems'presents'the'HotSpot'virtual'machine'again%
JavaOne'1998'
• Sun'Microsystems'presents'the'HotSpot'virtual'machine'again%– “WTF!'This'is'slide[by[slide'the'exact'same'presenta*on'as'last'year!?!”'
– We'can’t'wait'any'longer.'Let’s'build'our'own'VM.'How'hard'can'it'be?''
Crea*ng'our'own'JVM'['JRockit'
Produc*ze'a'narrower'domain?'
• Server[side'usage'only.'Headless.''– We'need'to'help'the'early'app'server'vendors'get'performance'and'scalability'
Produc*ze'a'narrower'domain?'
• Server[side'usage'only.'Headless.''– We'need'to'help'the'early'app'server'vendors'get'performance'and'scalability'
• No'interpreter''– “startup'*me'doesn’t'mader'on'the'server'anyway”'
Produc*ze'a'narrower'domain?'
• Server[side'usage'only.'Headless.''– We'need'to'help'the'early'app'server'vendors'get'performance'and'scalability'
• No'interpreter''– “startup'*me'doesn’t'mader'on'the'server'anyway”'
• Green'threads'or'n'x'm'threads.''– Explicit'parallelism'was'all[pervasive.'
Produc*ze'a'narrower'domain?'
• Incremental'GC''– We'thought'something'like'[Seligman,'Grarup]'would'suffice.''
Produc*ze'a'narrower'domain?'
• Incremental'GC'– We'thought'something'like'[Seligman,'Grarup]'would'suffice.''
• Support'ourselves'on'consul*ng'only.'– Nope'–'needed'venture'capital'
The'Java'License'
• You'can’t'call'yourself'“Java”'without'a'Java'license'
• You'need'to'pass'the'TCK'test'suite'– Not'available'without'license'
• To'get'a'Java'License'you'need'a'“value'add”'
The'Java'License'
• What’s'a'“value'add”?'
The'Java'License'
• What’s'a'“value'add”?'
The'Java'License'
• What’s'a'“value'add”?'
The'Java'License'
• What’s'a'“value'add”?'– Superior'performance!'
The'Java'License'
• What’s'a'“value'add”?'– Superior'performance!'– What?'You'didn’t'like'that?'
The'Java'License'
• What’s'a'“value'add”?'– Superior'performance!'– What?'You'didn’t'like'that?'– OK…'Let’s'see…'Err..'“managability”'
The'Java'License'
• Java'License'was'granted'2001'– Helped'us'partner'up'with'BEA'Systems'and'Intel'
– BEA'acquired'us'in'2002'– Oracle'acquired'BEA'in'2008'– Oracle'acquired'Sun'in'2010'
What'did'we'accomplish?''!
The'real'value'adds'turned'out'to'be:''
• Mul*'*ered'support'for'paying'customers'– Part'of'the'WLS'stack'
'
• Monitoring'and'Serviceability'– JRockit'Mission'Control'(now'Java'Mission'Control)'
– Record'and'introspect'produc*on'systems'with'zero'overhead.'
'
The'real'value'adds'turned'out'to'be:''
• Pioneered'“SoY'real*me”'GC'– Determinis*c'GC'– Low'latency'GC'
The'real'value'adds'turned'out'to'be:''
• Virtualiza*on'– JRockit'Virtual'Edi*on'–'an'opera*ng'system'for'Java'
– Shorter'paths'between'Java'and'hardware'– Hypervisor'required'– JRockit'VE'on'virtual'hardware'outperformed'physical'Linux!'
'
The'real'value'adds'turned'out'to'be:''
• The'benchmark'wars'– Constantly'keeping'it'going'with'Sun'and'IBM,'driving'Java'server[side'performance'
'
The'real'value'adds'turned'out'to'be:''
• JRockit'became'the'default'JVM'in'the'Oracle'stack'in'2008'
• ExaLogic''
…'and'then'
INTERNALS!
@SimmsUpNorth
Code'Genera*on'
Code'genera*on'–'No'Interpreter'
• Keep'test'matrix'small'• Keep'opera*onal'complexity'down'• Targe*ng'server'side'apps'–'warmup'a'small'issue'
• “Code'caching'/'AOT'can'be'done'later”'
Code'genera*on'–'One'JIT'
• Keep'test'matrix'small'• Keep'opera*on'complexity'down'• Run'it'in'different'modes,'with'maximum'code'reuse'
• Same'IR'throughout'– With'gradual'augmenta*ons'
But…'
• Startup'became'a'problem'– We'removed'op*mizers'and'added'as'a'“spine”'to'the'normal'JIT'pipeline.''
• Lazy'code'genera*on'through'trampolines'• Same'mechanism'for'code'invalida*on'• Bookkeeping'to'iden*fy'a'program'point'down'to'any'individual'machine'instruc*on'
Code'Genera*on'
• Same'“spine”'used'in'all'*ers'of'code'genera*on'
Code'Genera*on'
• Same'“spine”'used'in'all'*ers'of'code'genera*on'
Op*miza*ons'
• In'and'out'of'SSA'• Applied'to'all'levels'of'IR'
– Loop%peeling,%value%numbering,%String%append%explosion,%Type%check%removal,%sign%extension%elimina;on,%copy%propaga;on,%bounds%check%removal,%virtual%to%fixed%calls,%inlining,%if%short%circui;ng,%straightening,%strength%reduc;on,%constant%propaga;on,%dead%code%removal,%out%of%loop%hois;ng,%explode%objects%and%array%copies,%boxing%&%unboxing%removal,%local%escape%analysis,%ASM%peephole%op;miza;on,%redundant%memory%access%removal,%etc%etc%etc…%
• Support'for'regionalized'IRs'• Graph'Fusion'Register'Allocator'
Op*miza*on'Targets'
• Thread'sampling'• Partly'taken'over'by'safe'point'based'approach'in'R28'
• Some'code'instrumenta*on,'for'example'for'inlining'path'– Not'in'the'general'case,'e.g'invoca*on'counters'
Op*miza*on'Targets'
• Hardware'sampling'where'available'– Only'good'thing'about'IA64?'– Could'also'match'e.g.'L2'misses'to'program'points'
• Bugging'the'processor'manufacturers'since'2002'about'userland'PC'sample'buffer.'
• JRockit'VE'x'1000'more'samples'–'significantly'proven'shorter'warmup'
HotSpot'style?'
• On[stack'replacement?'• Deop*miza*on?'
HotSpot'style?'
• On[stack'replacement?'• Deop*miza*on?'• Never'much'cared'for'any'it';[)'
HotSpot'Style'OSR'and'Deop*miza*on'
• We’ve'never'found'a'prac*cal'use'case.''– So'we'can’t'ever'swap'out'the'main'func*on'with'the'microbenchmark'loop.'Who'cares?'
• An'assump*on'is'invalidated'– Either'patch'code'directly'or'use'a'guard'when'genera*ng'it'in'the'first'place'
• A'large'assump*on'– Write'a'trap'in'the'code'and'schedule'lazy'regenera*on'of'en*re'method'
• Not'strictly'true'for'dynamic'languages'
HotSpot'Style'OSR'and'Deop*miza*on'
• We’ve'never'found'a'prac*cal'use'case.''– So'we'can’t'ever'swap'out'the'main'func*on'with'the'microbenchmark'loop.'Who'cares?'
• An'assump*on'is'invalidated'– Either'patch'code'directly'or'use'a'guard'when'genera*ng'it'in'the'first'place'
• A'large'assump*on'– Write'a'trap'in'the'code'and'schedule'lazy'regenera*on'of'en*re'method'
• Not'strictly'true'for'dynamic'languages'
HotSpot'Style'OSR'and'Deop*miza*on'
• We’ve'never'found'a'prac*cal'use'case.''– So'we'can’t'ever'swap'out'the'main'func*on'with'the'microbenchmark'loop.'Who'cares?'
• An'assump*on'is'invalidated'– Either'patch'code'directly'or'use'a'guard'when'genera*ng'it'in'the'first'place'
• A'large'assump*on'– Write'a'trap'in'the'code'and'do'regenera*on'of'en*re'method'
• Not'strictly'true'for'dynamic'languages'
HotSpot'Style'OSR'and'Deop*miza*on'
• We’ve'never'found'a'prac*cal'use'case.''– So'we'can’t'ever'swap'out'the'main'func*on'with'the'microbenchmark'loop.'Who'cares?'
• An'assump*on'is'invalidated'– Either'patch'code'directly'or'use'a'guard'when'genera*ng'it'in'the'first'place'
• A'large'assump*on'– Write'a'trap'in'the'code'and'do'regenera*on'of'en*re'method'
• Not'strictly'true'for'dynamic'languages'
“Garbage'collec*ng'code”'
• Code'kept'in'binary'tree'of'code'blocks'~'64M'– More'if'large'pages'enabled'
• Class'loader'unloading'!'garbage'collec*on'• Reference'count'to'ac*ve'code'modified'when'backpatching'
• Specialized'usage'of'code'blocks.'– Trampolines'only'– Op*mized'code'only'
Bytecode'is'bad'–'kill'it'quickly'• What’s'with'the'goto:s?'• Why'can'it'express'more'than'Java'source'code?''– OK'we'understand'the'mul*'language'concept,'we'sorta'forgive'you.'
– But'man,'dominators'and'loop'analysis'–'that’s'a'lot'of'compile'*me'
Bytecode'is'bad'–'kill'it'quickly'• …and'why'is'it'a'stack'machine'AND'a'register'machine'with'65535'registers'at'the'same'*me!? ' ''
• Ini*ally'tried'to'reconstruct'ASTs'– Obfuscators'etc'made'this'predy'hopeless.'
• ~15%'of'the'klocs'in'JRockit/codegen'do'flow'control'analysis'on'the'goto:s'
The'IR'• Use'IR'everywhere'(or'Java)'• The'IR'should'ideally'reflect'any'of'several'pluggable'frontends.' ''– We'played'around'with'CLR'a'bit.'– These'days'–'dynamic'languages':[)'
• No'Sea'of'Nodes'• No'HotSpot'style'“high'level'IR'is'low'level”'
The'IR'
• Simple'IR'in'MIR'form'(playorm'independent)'
The'IR'–'Design'Ra*onale'• We'had'some'compiler'experience'–'wanted'to'be'on'track'quickly.'Do'it'the'tradi*onal'way.'
• We'are'not'“wrong”.'LLVM'is'very'similar.'
The'IR'–'Design'Ra*onale'• Tiered:%highest%;er%=='always'high'level'• Hardware'agnos*c.''• No'architecture'specific'memory'ops'
• Tiered:%lowest%;er%=='always'the'na*ve'architecture'instruc*on'for'instruc*on.''• A'gradual'transi*on.''• A'CPU'has'no'sea'of'nodes.'
The'IR'• Highest'IR'level'may'have'opera*ons'as'operands'
• Intrinsics'everywhere'– arraycopy, membar, cmpuXX, sse4IndexOf, doubleToLongBits, crypto, Math.sin%and%so%on%…'
• Regret'not'doing'more'in'SSA'form'
The'IRInfo'“database”'• Lazily'computable'informa*on'– Liveness'– Dominators'– Loop'informa*on'– Aliases'– Type'inference'– Ranges'– Nullness'analysis'– …'– Invalidate'on'modifica*on.'
• Not'a'very'stable'model.'
Memory'management'!
Transi*on:'object'layout,'types'and'livemaps…'
Object'layout'and'types'
• Object'headers'should'be'fixed'sized.'• JRockit'Object'header'is'32'+'32'bits'on'all'playorms'with'some'content'varia*ons.'
• [Grove]'ramblings'on'object'models'• Type'tree'maintained'similar'to'[Krall,'Vitek,'Horspool]''
'
Livemaps'(oopmaps)'
• Registers'and'stack'slots'on'the'local'frame'that'contain'objects.''
• Nothing'strange'here.'Required'for'non[conserva*ve'garbage'collec*on'of'any'sort.'
• Internal'pointer'bit'• Forms'the'root'set.''• Rollforwarding'vs'the'safepoint'approach'
Transi*on'['Livemaps'
Memory'management'• Garbage'collectors'– Concurrent'– Parallel'– Determinis*c'
• With'or'without'genera*ons'
Memory'management'• Concurrent'collec*on'– Your'basic'genera*onal'concurrent'mark'and'sweep'collector'[Printezis,'Detlefs]'
– Supports'mul*'genera*on'(>1)'young'spaces.'• Combats'heavy'object'alloca*on'situa*ons.'• Adap*vely'balanced'against'copy'overhead'
– Write'barriers'before'object'writes'– Minimize'stopping'the'world'– Young'collec*ons'use'a'variant'of'stop'&'copy'
Memory'management'• Can'also'run'with'a'parallel'policy'– Stop'the'world'and'clean'up'quickly'– Only'throughput'oriented'– No'write'barriers,'as'there'is'no'need'for'a'card'table'
Mark'&'Sweep'• Backbone'of'GC'based'on'tradi*onal'tri[color'mark'and'sweep'
• Adap*ve'thread'usage'and'addi*onal'concurrency'
Mark'&'Sweep'• Two'colors'–'not'three.'– Object'is'in'one'of'two'sets'– Live'objects:'grey'bits'(mix'of'grey'&'black'objects'in'tradi*onal'tri[coloring)'
– Dis*nc*on'handled'by'pu~ng'grey'objects'in'thread'local'queues'for'each'GC'thread.''
– Parallel'threads'can'work'on'thread'local'data'– Efficient'prefetching'is'possible'due'to'FIFO'order.'
No'permgen!'Ever!'
Other'nice'features'• No'permgen!!!'Ever!'
Other'nice'features'• No'permgen!!!'Ever!'• Pinned'objects.'– Fast'memory'buffers'– Also'enable'non[con*guous'heaps'
Other'nice'features'• No'permgen!!!'Ever!'• Pinned'objects.'– Fast'memory'buffers.'– Also'enable'non[con*guous'heaps.'
• Compac*on'– “Internal'and'external”.''– G1'evacuates'regions'instead'with'a'stop'the'world[and[copy'policy'similar'to'JRockit'YC'
Memory'management'• Concurrent'GC'has'an'addi*onal'set:'live'bits'– Contains'all'live'objects'in'the'system,'including'the'newly'created'ones.''
– JRockit'can'quickly'find'objects'that'have'been'created'during'a'concurrent'mark'phase.''
– Card'tables''• Not'just'for'genera*onal'GC'• Also'to'avoid'searching'the'en*re'live'object'graph'when'a'concurrent'mark'phase'cleans'up.'
• Just'look'at'dirty'cards'at'the'end'of'the'mark'phase.'
Young'Collec*ons'• A'variant'of'stop'and'copy'is'used.''– All'threads'are'halted'and'objects'are'deleted'or'promoted'– Hierarchical'breadth'first'copy'for'cache'locality'• Parallelizes'nicely'• Many'threads'always'harvest'a'young'space'
Young'Collec*ons'• Young'and'old'collec*ons'may'occur'at'same'*me.'– All'bit'sets'and'data'structures'can'be'shared'as'long'as'the'old'collec*on'is'guaranteed'to'see'all'cards'that'have'become'dirty'during'a'concurrent'phase.'(Extra'card'table'to'record'this'“difference”'–'“modified'union'set”)''
– Keep'this'intact'for'old'collec*on'
Thread'Local'Alloca*on'• Thread'local'alloca*on''• Thread'local'areas'are'roughly'L2'cache'sized'and'objects'are'allocated'here'before'they'are'forced'upon'the'heap'
Compressed'References'• For'less'than'4'(or'4'*'x)'GB'of'maximum'heap'size'
• Use'32'bit'pointers'(or'32'+'log2(x)'bits)'CompRef compress(Ref ref) {
return (uint32_t)ref; //truncate reference to 32-bits
}
Ref decompress(CompRef ref) {
return globalHeapBase | ref;
}
Compressed'References'• For'less'than'4'(or'4'*'x)'GB'of'maximum'heap'size'
• Use'32'bit'pointers'(or'32'+'log2(x)'bits)'CompRef compress(Ref ref) {
return (uint32_t)ref; //truncate reference to 32-bits
}
Ref decompress(CompRef ref) {
return globalHeapBase | ref;
}
CompRef compress(Ref ref) {
return (uint32_t)(ref >> log2(objectAlignment));
}
Ref decompress(CompRef ref) {
return globalHeapBase | (ref << log2(objectAlignment));
}
Determinis*c'GC'• QoS'level'for'latencies.'“No'more'than'X'ms”'• Down'to'single'digits'on'modern'x86'hardware'
• Caveat:'live'data'on'heap'is'the'main'constraint.''– Up'to'50%'of'heap'live'data's*ll'feasible'
Determinis*c'GC'
Determinis*c'GC'–'How?'
• Greedy'strategy'– Postpone'stopping'the'world'for'as'long'as'possible.'
– Maybe'the'problem'goes'away'and'we'don’t'have'to'stop'the'world'
• Split'up'everything'into'work'packets'– Drop'them'at'any'*me.'
Determinis*c'GC'–'How?'
• Efficient'paralleliza*on.'– Mark'phase'is'90%'of'GC'*me''
• Efficient'heuris*cs'– Some'more'work'in'e.g.'write'barriers'
Threads'and'Synchroniza*on'!
Threads'and'Synchroniza*on'• A'java.lang.Thread'is'a'na*ve'thread.'– Interes*ng,'though:'thread'pooling'and'pseudo'thin[threads'are'back,'for'example'in'Akka.'
– Java'8'–'Collection.parallelStream– The'world'is'moving'towards'implicit'parallelism'in'general'
• Most'of'the'JRockit'thread'code'and'adap*vity'logic'is'wriden'in'Java'
Threads'and'Synchroniza*on'• Locks'are'thin'or'fat''– Adap*ve'infla*on'and'defla*on'
• Lazy'locking'(biased'locking'supported)'– Adap*ve'heuris*cs'for'banning'and'retrying'the'lazy'approach.'
Threads'and'Synchroniza*on'public class PseudoSpinlock {
private static final int LOCK_FREE = 0; private static final int LOCK_TAKEN = 1;
public void lock() { //burn cycleswhile (cmpxchg(LOCK_TAKEN, &lock) == LOCK_TAKEN) {
micropause(); //optional}
}
public void unlock() { int old = cmpxchg(LOCK_FREE, &lock); //guard against recursive locksassert(old == LOCK_TAKEN);
} }
Threads'and'Synchroniza*on'• Locks'are'thin'when'first'taken'• Time'spent'in'lock'and'*mes'taken'triggers'infla*on'
• wait'or'notify'immediately'inflates'a'lock'
• Fat'locks'are'also'deflated'when'uncontended'for'too'long'
'
Threads'and'Synchroniza*on'
Threads'and'Synchroniza*on'• Thin'lock'lifecycle'
Threads'and'Synchroniza*on'Thin'&'fat'lock'lifecycle'
Lock'Pairing'• Bytecode'again'–'no'restric*on'on'matching'monitorenter'with'monitorexit
• Not'all'of'them'can'be'analyzed'by'the'JIT'
Lock'Pairing'• We'can'store'what'we'know,'and'make'unlocks'quick.''– Lock'tokens'(the'object'OR'3'bits)'• Thin,'fat,'recursive,'lazily'taken,'unmatched'
– Livemap'system'contains'nes*ng'order.'
Op*miza*ons'• A'lot'of'smallish'code'gen'transforms:'e.g.'Lock'fusion'• “Fat'spin”'• Lazy'unlocking'(biased'locking)'
– Start'assuming'all'locks'are'lazy.'Tag'thin'locks'as'lazily'locked.'
– If'object'already'lazily'locked'• If'it’s'the'same'thread:'profit'• Else'–'stop'the'lock'holder,'detect'the'“real”'lock'state'by'stack'walk.'Convert'to'thin'lock'or'forcefully'unlock'it'
– Transfer'bits'– Heuris*cs:'object'and'class'banning.'Ageing.'
Threads'and'Synchroniza*on'
• Thin,'fat'&'lazy'lock'lifecycle'
Export'it'all!'–'JRockit'Mission'Control'
(now'Java'Mission'Control)'
!
@javamissionctrl!$JAVA_HOME/bin/jmc
Mission'Control'• Use'“free”'run*me'informa*on!'– JRockit'(Java)'Mission'Control'
• JRockit'(Java)'flight'recorder'• Memory'leak'detector'(JRockit'only)'• Management'console'
• $JAVA_HOME/bin/JCMD'(used'to'be'JRCMD)'• Everything'in'the'VM'abstracted'into'an'event'that'may'or'may'not'have'a'dura*on'
• Soon:'public'API'
Java'Flight'Recorder'• Always'on'– Excellent'for'debugging'and'analysis'of'crashes'– Can'be'set'to'record'more'intrusively'for'periods'in'produc*on'• E.g.'extensive'lock'profiling'
• Everything'is'an'event'• Buffered'recording'–'the'last'n'seconds'available'at'any'crash'or'when'a'command'is'given.'
• Very'fine'precision.'– Mul*media'*mers'and'system'hardware'support'required'for'e.g.'latencies'
Latency'Analysis'
The'Management'Console'• Peek'into'the'running'produc*on'JVM'• Add'triggers'on'events'• Interact'with'the'VM:'force'GC'etc.'
The'Memory'Leak'Detector'• Introspect'the'type'graph'in'real*me.'Look'for'types'that'are'growing'despite'GC:s'
• “Trending'alloca*ons”'
Studying'a'recording'offline'
JRockit'Virtual'Edi*on'
Is'the'JVM'an'OS?'
Is'the'JVM'an'OS?'
• Add'a'coopera*ve'aspect'to'thread'switching'• Zero[copy'networking'code'• Reduce'cost'of'entering'OS'• Balloon'driver'• Runs'only'on'hypervisor'• Facilitates'pauseless'GC'
Is'the'JVM'an'OS?'
Thank'you!''
Would!you!like!to!!
know!more?!
Oracle'JRockit'–''
the'Defini3ve'Guide'