Date post: | 23-Jan-2017 |
Category: |
Technology |
Upload: | richardwarburton |
View: | 975 times |
Download: | 1 times |
Java CollectionsThe Force Awakens
Darth @RaoulUKDarth @RichardWarburto#javaforceawakens
Evolution can be interesting ...Java 1.2 Java 10?
Collection API Improvements
Persistent & Immutable Collections
Performance Improvements
Collection bugs
1. Element access (Off-by-one error, ArrayOutOfBound)2. Concurrent modification 3. Check-then-Act
Scenario 1
List<String> jedis = new ArrayList<>(asList("Luke", "yoda"));
for (String jedi: jedis) {
if (Character.isLowerCase(jedi.charAt(0))) {
jedis.remove(jedi);
}
}
Scenario 2
Map<String, BigDecimal> movieViews = new HashMap<>();
BigDecimal views = movieViews.get(MOVIE);
if(views != null) {
movieViews.put(MOVIE, views.add(BigDecimal.ONE));
}
views != nullmoviesViews.get movieViews.putThen
Check Act
Reducing scope for bugs
● ~280 bugs in 28 projects including Cassandra, Lucene
● ~80% check-then-act bugs discovered are put-if-absent
● Library designers can help by updating APIs as new idioms emerge
● Different data structures can provide alternatives by restricting reads & updates to reduce scope for bugs
CHECK-THEN-ACT Misuse of Java Concurrent Collectionshttp://dig.cs.illinois.edu/papers/checkThenAct.pdf
Java 9 API updates
Collection factory methods● Non-goal to provide persistent immutable collections● http://openjdk.java.net/jeps/269
Live Demo using jShellhttp://iteratrlearning.com/java9/2016/11/09/java9-collection-factory-methods
Collection API Improvements
Persistent & Immutable Collections
Performance Improvements
Categorising Collections
Mutable
Immutable
Non-Persistent Persistent
Unsynchronized Concurrent
Unmodifiable View
Available in Core Library
Mutable
● Popular friends include ArrayList, HashMap, TreeSet
● Memory-efficient modification operations
● State can be accidentally modified
● Can be thread-safe, but requires careful design
Unmodifiable
List<String> jedis = new ArrayList<>();
jedis.add("Luke Skywalker");
List<String> cantChangeMe = Collections.unmodifiableList(jedis);
// java.lang.UnsupportedOperationException
//cantChangeMe.add("Darth Vader");
System.out.println(cantChangeMe); // [Luke Skywalker]
jedis.add("Darth Vader");
System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]
Immutable & Non-persistent
● No updates
● Flexibility to convert source in a more efficient representation
● No locking in context of concurrency
● Satisfies co-variant subtyping requirements
● Can be copied with modifications to create a new version (can be
expensive)
Immutable vs. Mutable hierarchy
ImmutableList MutableList
+ ImmutableList<T> toImmutable()
java.util.List
+ MutableList<T> toList()
Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/
ListIterable
Immutable and Persistent
● Changing source produces a new (version) of the collection
● Resulting collections shares structure with source to avoid full copying on updates
LISP anyone?
Persistent List (aka Cons)
public final class Cons<T> implements ConsList<T> {
private final T head;
private final ConsList<T> tail;
public Cons(T head, ConsList<T> tail) {
this.head = head; this.tail = tail;
}
@Override
public ConsList<T> add(T e) {
return new Cons(e, this);
}
}
Updating Persistent List
A B C X Y Z
Before
Updating Persistent List
A B C X Y Z
Before
A B D
After
Blue nodes indicate new copiesPurple nodes indicates nodes we wish to update
Concatenating Two Persistent Lists
A B C
X Y Z
Before
Concatenating Two Persistent Lists
- Poor locality due to pointer chasing- Copying of nodes
A B C
X Y Z
Before
A B C
After
Persistent List
● Structural sharing: no need to copy full structure
● Poor locality due to pointer chasing
● Copying becomes more expensive with larger lists
● Poor Random Access and thus Data Decomposition
Updating Persistent Binary Tree
Before
Updating Persistent Binary Tree
After
Persistent Array
How do we get the immutability benefits with performance of mutable variants?
Trieroot
10 4520
3. Picking the right branch is done by using parts of the key as a lookup
1. Branch factor not limited to binary
2. Leaf nodes contain actual values
a
a e
bc
b c f
Persistent Array (Bitmapped Vector Trie)... ...
... ...
... ...
... ...
.
.
.
.
.
.
1 31
0 1 31
Level 1 (root)
Level 2
Leaf nodes
Trade-offs
● Large branching factor facilitates iteration but hinders updates
● Small branching factor facilitates updates but hinders traversal
Java Persistent Collections
- Not available as part of Java Core Library
- Existing projects includes- PCollections: https://github.com/hrldcpr/pcollections- Port of Clojure DS: https://github.com/krukow/clj-ds- Port of Scala DS: https://github.com/andrewoma/dexx- Now also in Javaslang: http://javaslang.io
Memory usage survey
10,000,000 elements, heap < 32GB
int[] : 40MBInteger[]: 160MBArrayList<Integer>: 215MBPersistentVector<Integer>: 214MB (Clojure-DS)Vector<Integer>: 206MB (Dexx, port of Scala-DS)
Data collected using Java Object Layout: http://openjdk.java.net/projects/code-tools/jol/
Takeaways
● Immutable collections reduce the scope for bugs
● Always a compromise between programming safety and performance
● Performance of persistent data structure is improving
Collection API Improvements
Persistent & Immutable Collections
Performance Improvements
O(N)
O(1)
O(HYPERSPACE)
Primitive specialised collections
● Collections often hold boxed representations of primitive values
● Java 8 introduced IntStream, LongStream, DoubleStream and
primitive specialised functional interfaces
● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide
primitive specialised collections today.
● Valhalla investigates primitive specialised generics
Java 8 Lazy Collection Initialization
Many allocated HashMaps and ArrayLists never written to, eg Null object pattern
Java 8 adds Lazy Initialization for the default initialization case
Typically 1-2% reduction in memory consumption
http://www.javamagazine.mozaicreader.com/MarApr2016/Twitter#&pageSet=28&page=0
HashMaps Basics
...
Han Solohash = 72309
Chewbaccahash = 72309
Chaining Probing
HashMaps
a separate data structure for collision lookups
Store inline and have a probing sequence
Aliases: Palpatine vs Darth Sidious
Chaining Probing
HashMaps
aka Closed Addressing
aka Open Hashing
aka Open Addressing
aka Closed Hashing
Chaining Probing
HashMaps
Linked List Based Tree Based
java.util.HashMap
Chaining Based HashMap
Historically maintained a LinkedList in the case of a collision
Problem: with high collision rates that the HashMap approaches O(N) lookup
java.util.HashMap in Java 8
Starts by using a List to store colliding values.
Trees used when there are over 8 elements
Tree based nodes use about twice the memory
Make heavy collision lookup case O(log(N)) rather than O(N)
Relies on keys being Comparable
https://github.com/RichardWarburton/map-visualiser
So which HashMap is best?
Example Jar-Jar Benchmark
call get() on a single value for a map of size 1
No model of the different factors that affect things!
Tree Optimization - 60% Collisions
Tree Optimization - 10% Collisions
Probing vs Chaining
Probing Maps usually have lower memory consumption
Small Maps: Probing never has long clusters, can be up to 91% faster.
In large maps with high collision rates, probing scales poorly and can be significantly slower.
Takeaways
There’s no clearcut “winner”.
JDK Implementations try to minimise worst case.
Linear Probing requires a good hashCode() distribution, Often hashmaps “precondition” their hashes.
IdentityHashMap has low memory consumption and is fast, use it!
3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.
Conclusions
Any Questions?
www.iteratrlearning.com
● Modern Development with Java 8● Reactive and Asynchronous Java● Java Software Development Bootcamp
#javaforceawakens
Further reading
Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrayshttps://infoscience.epfl.ch/record/64410/files/techlists.pdf
Smaller Footprint for Java Collectionshttp://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf
Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collectionshttp://michael.steindorfer.name/publications/oopsla15.pdf
RRB-Trees: Efficient Immutable Vectorshttps://infoscience.epfl.ch/record/169879/files/RMTrees.pdf
Further reading
Doug Lea’s Analysis of the HashMap implementation tradeoffshttp://www.mail-archive.com/[email protected]/msg02147.html
Java Specialists HashMap article
http://www.javaspecialists.eu/archive/Issue235.html
Sample and Benchmark Codehttps://github.com/RichardWarburton/Java-Collections-The-Force-Awakens