Date post: | 25-May-2015 |
Category: |
Technology |
Upload: | srisatish-ambati |
View: | 1,935 times |
Download: | 5 times |
How to Stop Worrying and Start Caching in Java
SriSatish Ambati , Azul Systems [email protected] Surtani, RedHat [email protected]
The TrailExamples
Elements of Cache Performance
Theory
Metrics
200GB Cache Design
Whodunit
Overheads in Java – Objects, GC
Locks,
Communication, Sockets, Queues (SEDA)
Serialization
Measure
Wie die wahl hat, hat die Qual!He who has the choice has the agony!
Some example caches
Homegrown caches – Surprisingly work well.
(do it yourself! It’s a giant hash)
Infinispan, Coherence, Gemstone, GigaSpaces, EhCache, etc
NoSQL stores (Apache Cassandra)
Non java alternatives: MemCached & clones.
Visualize Cache
Simple example
Visualize Cache
Replicated Cache Distributed Cache
Elements of Cache Performance
Hot or Not: The 80/20 rule.
A small set of objects are very popular!
Hit or Miss: Hit Ratio
How effective is your cache?
LRU, LFU, FIFO, LIRS.. Expiration
Long-lived objects, better locality
Spikes happen
Cascading events: Node load, Node(s) dead.
Cache Thrash: Full Table scan.
Elements of Cache Performance : Metrics
Inserts: Puts/sec, Latencies
Reads: Gets/sec, Latencies, Indexing
Updates: mods/sec, latencies (Locate, Modify & Notify)
Replication, Consistency, Persistence
Size of Objects
Number of Objects
Size of Cache
# of cacheserver Nodes (read only, read write)
# of clients
Partitioning & Distributed Caches
Near Cache/L1 Cache
Bring data close to the Logic that is using it.
Birds of feather flock together - related data live closer
Read-only nodes, Read-Write nodes
Management nodes
Communication Costs
Balancing (buckets)
Serialization (more later)
I/O considerations
Asynchronous
Sockets
Queues & Threads serving the sockets
Bandwidth
Persistence –
File, DB (CacheLoaders)
Write Behind
Data Access Patterns of Doom, ex:
“Death by a million cuts” – Batch your reads.
Buckets–Partitions, Hashing function
Birthdays, Hashmaps & Prime Numbers
Collisions, Chaining
Unbalanced HashMap
- behaves like a list O(n) retrieval
Partition-aware Hashmaps
Non-blocking Hashmaps
(see: locking)
Performance Degrades with 80% table density
Imagine – John Lennon
How many nodes to get a 200G cache?
Who needs a 200G cache?
Disk is the new Tape!
200 nodes @ 1GB heap each
2 nodes @ 100GB heap each
(plus overhead)
SIDE ONE
Join together with the band
I don’t even know myself
SIDE TWO
Let’s see action
Relay
Don’t happen that way at all
The seeker
Java Limits: Objects are not cheap!
How many bytes for a 8 char String ?
(assume 32-bit)
How many objects in a Tomcat idle instance?
char[]
String
book keeping fields 12 bytes
JVM Overhead 16 bytes
Pointer 4 bytes
data 16 bytes
JVM Overhead 16 bytes
A. 64bytes
31% overhead
Size of String
Varies with JVM
Picking the right collection: Mozart or Bach?
100 elements of:
Treemap <Double, Double>
82% overhead, 88 bytes constant cost per element
[pro: enables updates while maintaining order]
double[], double[] –
2% overhead, amortized
[con: load-then-use]
Sparse collections, empty collections. wrong collections.
TreeMap
Fixed Overhead: 48 bytes TreeMap$Entry
data
Per-entry Overhead: 40 bytes
Double double
*From one 32-bit JVM. Varies with JVM Architecture
Double
JVM Overhead 16 bytes
data 8 bytes
double
JEE is not cheap either!
Class name Size (B) Count Avg (B)Total 21,580,592 228,805 94.3char[] 4,215,784 48,574 86.8byte[] 3,683,984 5,024 733.3Built-in VM methodKlass 2,493,064 16,355 152.4Built-in VM constMethodKlass 1,955,696 16,355 119.6Built-in VM constantPoolKlass 1,437,240 1,284 1,119.30Built-in VM instanceKlass 1,078,664 1,284 840.1java.lang.Class[] 922,808 45,354 20.3Built-in VM constantPoolCacheKlass 903,360 1,132 798java.lang.String 753,936 31,414 24java.lang.Object[] 702,264 8,118 86.5java.lang.reflect.Method 310,752 2,158 144short[] 261,112 3,507 74.5java.lang.Class 255,904 1,454 176int[][] 184,680 2,032 90.9java.lang.String[] 173,176 1,746 99.2java.util.zip.ZipEntry 172,080 2,390 72
Apache Tomcat 6.0Allocated
Class name Size (B) Count Avg (B)Total 1,410,764,512 19,830,135 71.1char[] 423,372,528 4,770,424 88.7byte[] 347,332,152 1,971,692 176.2int[] 85,509,280 1,380,642 61.9java.lang.String 73,623,024 3,067,626 24java.lang.Object[] 64,788,840 565,693 114.5java.util.regex.Matcher 51,448,320 643,104 80java.lang.reflect.Method 43,374,528 301,212 144java.util.HashMap$Entry[] 27,876,848 140,898 197.9java.util.TreeMap$Entry 22,116,136 394,931 56java.util.HashMap$Entry 19,806,440 495,161 40java.nio.HeapByteBuffer 17,582,928 366,311 48java.nio.HeapCharBuffer 17,575,296 366,152 48java.lang.StringBuilder 15,322,128 638,422 24java.util.TreeMap$EntryIterator 15,056,784 313,683 48java.util.ArrayList 11,577,480 289,437 40java.util.HashMap 7,829,056 122,329 64java.util.TreeMap 7,754,688 107,704 72
Million Objects allocated live
JBoss 5.1 20 4Apache Tomcat 6.0 0.25 0.1
Live
JBoss 5.1Allocated
Java Limits: Garbage Collection
GC defines cache configuration
Pause Times: If stop_the_world_pause > time_to_live node is declared dead
Allocation Rate: Write, Insertion Speed.
Live Objects (residency)
if residency > 50%. GC overheads dominate.
Increasing Heap Size only increases pause times.
64-bit is not going to rescue us either:
Increases object header, alignment & pointer overhead
40-50% increase in heap sizes for same workloads.
Overheads – cycles spent GC (vs. real work); space
Fragmentation, Generations
Fragmentation – compact often, uniform sized objects
[Finding seats for a gang-of-four
is easier in an empty theater!]
Face your fears, Face them Often!
Generational Hypothesis
Long-lived objects promote often,
inter-generational pointers, more old-gen collections.
Entropy: How many flags does it take to tune your GC ? Avoid OOM, configure node death if OOM Shameless plug: Azul’s Pauseless GC (now software edition) ,
Cooperative-Memory (swap space for your jvm under spike: No more OOM!)
Locks: Why Amdahl’s law trumps Moore’s!
Schemes
Optimistic, Pessimistic
Consistency
Eventually vs. ACID
Contention, Waits
java.util.concurrent, critical sections: Use Lock Striping
MVCC, Lock-free, wait-free DataStructures. (NBHM)
Transactions are expensive Reduce JTA abuse, Set the right isolation levels.
Inter-node communication
• TCP for mgmt & data– Infinispan
• TCP for mgmt, UDP for data– Coherence, Infinispan
• UDP for mgmt, TCP for data– Cassandra, Infinispan
• Instrumentation– EHCache/Terracotta
• Bandwidth & Latency considerations Ensure proper network configuration in the kernel Run Datagram tests Limit number of management nodes & nodes
Sockets, Queues, Threads, Stages
How many sockets?
gemstone (VMWare) : Multi-socket implementation, infinispan
alt: increase ports, nodes, clients
How many threads per socket? Mux
Asynchronous IO/Events (apache mina, jgroups)
Curious Case of a single threaded Queue Manager Reduce context switching SEDA
Marshal Arts: Serialization/Deserialization
java.io.Serializable is S.L..O.…W
+ Use “transient”
+ jserial, avro, etc
+ Google Protocol Buffers,
PortableObjectFormat (Coherence)
+ JBossMarshalling
+ Externalizable + byte[]
+ Roll your own
Serialization + Deserialization uBench
http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
Count what is countable, measure what is measurable, and what is not measurable, make measurable -Galileo
Latency: Where have all the millis gone?
Measure. 90th percentile. Look for consistency.
=> JMX is great! JMX is also very slow.
Reduced number of nodes means less MBeans!
Monitor (network, memory, cpu), ganglia,
Know thyself: Application Footprint, Trend data.
Q&A
References:
Making Sense of Large Heaps, Nick Mitchell, IBM
Oracle Coherence 3.5, Aleksandar Seovic
Large Pages in Java http://andrigoss.blogspot.com/2008/02/jvm-performance-tuning.html
Patterns of Doom http://3.latest.googtst23.appspot.com/
Infinispan Demos http://community.jboss.org/wiki/5minutetutorialonInfinispan
RTView, Tom Lubinski, http://www.sl.com/pdfs/SL-BACSIG-100429-final.pdf
Google Protocol Buffers, http://code.google.com/p/protobuf/
Azul’s Pauseless GC http://www.azulsystems.com/technology/zing-virtual-machine
Cliff Click’s Non-Blocking Hash Map http://sourceforge.net/projects/high-scale-lib/
JVM Serialization Benchmarks:
http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
Optimization hinders evolution – Alan Perlis