Post on 21-Jan-2018
transcript
Collectors in theWild@JosePaumard
Collectors?Why should we be interested in collectors?
▪ They are part of the Stream API
▪ And kind of left aside…
Collectors?YouTube:
▪ Stream tutorials ~700k
▪ Collectors tutorials < 5k
Collectors?Why should we be interested in collectors?
▪ They are part of the Stream API
▪ And kind of left aside…
And it’s a pity because it is a very powerful API
@JosePaumard
Microsoft Virtual Academy
Questions?#ColJ8
movies.stream().flatMap(movie -> movie.actors().stream()).collect(
Collectors.groupingBy(Function.identity(), Collectors.counting()
)).entrySet().stream().max(Map.Entry.comparingByValue()).get();
movies.stream().collect(
Collectors.groupingBy(movie -> movie.releaseYear(),
Collector.of(() -> new HashMap<Actor, AtomicLong>(), (map, movie) -> {
movie.actors().forEach(actor -> map.computeIfAbsent(actor, a -> new AtomicLong()).incrementAndGet()
) ;},(map1, map2) -> {
map2.entrySet().stream().forEach(entry -> map1.computeIfAbsent(entry.getKey(), a -> new AtomicLong()).addAndGet(entry.getValue().get())
) ;return map1 ;
}, new Collector.Characteristics [] {
Collector.Characteristics.CONCURRENT.CONCURRENT}
))
).entrySet().stream().collect(
Collectors.toMap(entry5 -> entry5.getKey(),entry5 -> entry5.getValue()
.entrySet().stream()
.max(Map.Entry.comparingByValue(Comparator.comparing(l -> l.get())))
.get())
).entrySet().stream().max(Comparator.comparing(entry -> entry.getValue().getValue().get())).get();
Do not give bugs a place to hide!
Brian Goetz
Collectors?Why should we be interested in collectors?
▪ They are part of the Stream API
▪ And kind of left aside…
And it’s a pity because it is a very powerful API
▪ Even if we can also write unreadable code with it!
AgendaQuick overview about streams
About collectors
Extending existing collectors
Making a collector readable
Creating new collectors
Composing Collectors
A Few Words on Streams
About StreamsA Stream:
▪ Is an object that connects to a source
▪ Has intermediate & terminal operations
▪ Some of the terminal operations can be collectors
▪ A collector can take more collectors as parameters
A Stream is…An object that connects to a source of data and watch them flow
There is no data « in » a stream ≠ collection
stream
About StreamsOn a stream:
▪ Any operation can be modeled with a collector
▪ Why is it interesting?
stream.collect(collector);
Intermediate Operations
stream
1st operation: mapping = changing the type
stream
2nd operation: filtering = removing some objects
3rd operation: flattening
stream
stream
3rd operation: flattening
Map, Filter, FlatMapThree operations that do not need any buffer to work
Not the case of all the operations…
Sorting elements using a comparator
The stream needs to see all the elementsbefore beginning to transmit them
stream
stream
Distinct
The Stream needs to remember all the elements before transmitting them (or not)
Distinct, sortedBoth operations need a buffer to store all the elements from the source
Intermediate Operations2 categories:
- Stateless operations = do not need to remember anything
- Stateful operations = do need a buffer
Limit and SkipTwo methods that rely on the order of the elements:
- Limit = keeps the n first elements
- Skip = skips the n first elements
Needs to keep track of the index of the elements and to process them in order
Terminal Operations
Intermediate vs TerminalOnly a terminal operation triggers the consuming of the data from the source
movies.stream().filter(movie -> movie.releaseYear() == 2007).flatMap(movie -> movie.actors().stream()).map(movie -> movie.getTitle());
Intermediate vs TerminalOnly a terminal operation triggers the consuming of the data from the source
movies.stream().filter(movie -> movie.releaseYear() == 2007).flatMap(movie -> movie.actors().stream()).map(movie -> movie.getTitle()).forEach(movie -> System.out.println(movie.getTitle()));
Terminal OperationsFirst batch:
- forEach
- count
- max, min
- reduce
- toArray
Terminal OperationsFirst batch:
- forEach
- count
- max, min
- reduce
- toArray
Will consume all the data
Terminal OperationsSecond Batch:
- allMatch
- anyMatch
- noneMatch
- findFirst
- findAny
Terminal OperationsSecond Batch:
- allMatch
- anyMatch
- noneMatch
- findFirst
- findAny
Do not need to consume all the data = short-circuit operations
Terminal OperationsSpecial cases:
- max
- min
- reduce
Returns an Optional (to handle empty streams)
https://www.youtube.com/watch?v=Ej0sss6cq14@StuartMarks
A First CollectorAnd then there is collect!
The most seen:
Takes a collector as a parameter
List<String> result = strings.stream()
.filter(s -> s.itEmpty())
.collect(Collectors.toList());
A First Collector (bis)And then there is collect!
The most seen:
Takes a collector as a parameter
Set<String> result = strings.stream()
.filter(s -> s.itEmpty())
.collect(Collectors.toSet());
A Second CollectorAnd then there is collect!
Maybe less known?:
Takes a collector as a parameter
String authors = authors.stream()
.map(Author::getName)
.collect(Collectors.joining(", "));
Demo Time
A Third CollectorCreating a Map
Map<Integer, List<String>> result = strings.stream()
.filter(s -> !s.isEmpty())
.collect(Collectors.groupingBy(
s -> s.length())
);
3
4
5
one, two, three, four, five, six, seven, eight, nine, ten
one, two, six, ten
four, five, nine
three, seven, eight
groupingBy(String::length)
Map<Integer, List<String>>
3
4
5
one, two, three, four, five, six, seven, eight, nine, ten
one, two, six, ten
four, five, nine
three, seven, eight
groupingBy(String::length, downstream)
.stream().collect(downstream)
.stream().collect(downstream)
.stream().collect(downstream)
3
4
5
one, two, three, four, five, six, seven, eight, nine, ten
one, two, six, ten
four, five, nine
three, seven, eight
groupingBy(String::length, Collectors.counting())
4L
3L
3L
Map<Integer, Long>
A Third Collector (bis)Creating a Map
Map<Integer, Long> result = strings.stream()
.filter(s -> s.itEmpty())
.collect(Collectors.groupingBy(
s -> s.length(), Collectors.counting())
);
Demo Time
A Collector that CountsNumber of articles per author
Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…
A1 A2
Gent
Walsh
Gent
Hoos
Prosser
Walsh
flatMap(Article::getAuthors)
Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…
Gent, Walsh, Gent, Hoos, Prosser, Walsh
flatMap(Article::getAuthors)
Gent
Walsh
Hoos
2L
2L
1L
Prosser 1L
groupingBy(
)
groupingBy(identity(),counting()
)
groupingBy(identity(),
)
Demo Time
Supply, Accumulate and Combine
Creating ListsA closer look at that code:
List<String> result = strings.stream()
.filter(s -> !s.isEmpty())
.collect(Collectors.toList());
stream a b b
collector1) Build the list2) Add elements one
by one
a b c
ArrayList
Creating Lists1) Building the list: supplier
2) Adding an element to that list: accumulator
Supplier<List> supplier = () -> new ArrayList();
BiConsumer<List<E>, E> accumulator = (list, e) -> list.add(e);
In parallel
Stream
Collector
collector1) Build a list2) Add elements one
by one3) Merge the lists
CPU 2
Stream
CollectorCPU 1
Creating Lists1) Building the list: supplier
2) Adding an element to that list: accumulator
3) Combining two lists
Supplier<List> supplier = ArrayList::new;
BiConsumer<List<E>, E> accumulator = List::add;
BiConsumer<List<E>, List<E>> combiner = List::addAll;
Creating ListsSo we have:
List<String> result = strings.stream()
.filter(s -> !s.isEmpty())
.collect(ArrayList::new,List::add, List::adAll);
Creating ListsSo we have:
List<String> result = strings.stream()
.filter(s -> !s.isEmpty())
.collect(ArrayList::new,Collection::add, Collection::adAll);
Creating SetsAlmost the same:
Set<String> result = strings.stream()
.filter(s -> !s.isEmpty())
.collect(HashSet::new,Collection::add, Collection::adAll);
String ConcatenationNow we need to create a String by concatenating the elements using a separator:
« one, two, six »
Works with Streams of Strings
String ConcatenationLet us collect
strings.stream().filter(s -> s.length() == 3).collect(() -> new String(),
(finalString, s) -> finalString.concat(s), (s1, s2) -> s1.concat(s2));
String ConcatenationLet us collect
strings.stream().filter(s -> s.length() == 3).collect(() -> new String(),
(finalString, s) -> finalString.concat(s), (s1, s2) -> s1.concat(s2));
String ConcatenationLet us collect
strings.stream().filter(s -> s.length() == 3).collect(() -> new StringBuilder(),
(sb, s) -> sb.append(s), (sb1, sb2) -> sb1.append(sb2));
String ConcatenationLet us collect
strings.stream().filter(s -> s.length() == 3).collect(StringBuilder::new,
StringBuilder::append, StringBuilder::append);
String ConcatenationLet us collect
StringBuilder stringBuilder = strings.stream()
.filter(s -> s.length() == 3)
.collect(StringBuilder::new,StringBuilder::append, StringBuilder::append);
String ConcatenationLet us collect
String string = strings.stream()
.filter(s -> s.length() == 3)
.collect(StringBuilder::new,StringBuilder::append, StringBuilder::append)
.toString();
A Collector is…3 Operations
- Supplier: creates the mutable container
- Accumulator
- Combiner
A Collector is…3 + 1 Operations
- Supplier: creates the mutable container
- Accumulator
- Combiner
- Finisher, that can be the identity function
Collecting and ThenAnd we have a collector for that!
strings.stream().filter(s -> s.length() == 3).collect(
Collectors.collectingAndThen(collector, finisher // Function
));
Demo Time
7634L {2004, 7634L}
Map<Long, List<Entry<Integer, Long>>>
7634L {2004, 7634L}
Map<Long, List<Entry<Integer, Long>>>
Entry<Integer, Long> -> Integer = mapping
7634L {2004, 7634L}
Map<Long, List<Entry<Integer, Long>>>
Entry<Integer, Long> -> Integer = mapping
Function<> mapper = entry -> entry.getKey();
Collectors.mapping(mapper, toList());
Demo Time
Collect toMapUseful for remapping maps
Do not generate duplicate keys!
map.entrySet().stream().collect(
Collectors.toMap(entry -> entry.getKey(), entry -> // create a new value
));
Custom Collectors:1) Filter, Flat Map2) Joins3) Composition
Coffee break!
About Types
The Collector Interfacepublic interface Collector<T, A, R> {
public Supplier<A> supplier(); // A: mutable container
public BiConsumer<A, T> accumulator(); // T: processed elments
public BinaryOperator<A> combiner(); // Often the type returned
public Function<A, R> finisher(); // Final touch
}
The Collector Interfacepublic interface Collector<T, A, R> {
public Supplier<A> supplier(); // A: mutable container
public BiConsumer<A, T> accumulator(); // T: processed elments
public BinaryOperator<A> combiner(); // Often the type returned
public Function<A, R> finisher(); // Final touch
public Set<Characteristics> characteristics();}
Type of a CollectorIn a nutshell:
- T: type of the elements of the stream
- A: type the mutable container
- R: type of the final container
We often have A = R
The finisher may be the identity function≠
one, two, three, four, five, six, seven, eight, nine, ten
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(
String::length,?
)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(
String::length,Collector<String, ?, >
)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight
one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, Value>> c = groupingBy(
String::length,Collector<String, ?, Value>
)
counting() : Collector<T, ?, Long>
3
4
5
4L
3L
3L
Intermediate Operations
Intermediate CollectorsBack to the mapping collector
This collector takes a downstream collectorstream.collect(mapping(function, downstream));
Intermediate CollectorsThe mapping Collector provides an intermediate operation
stream.collect(mapping(function, downstream));
Intermediate CollectorsThe mapping Collector provides an intermediate operation
Why is it interesting?
To create downstream collectors!
So what about integrating all our streamprocessing as a collector?
stream.collect(mapping(function, downstream));
Intermediate CollectorsIf collectors can map, why would’nt they filter, or flatMap?
…in fact they can in 9 ☺
Intermediate CollectorsThe mapping Collector provides an intermediate operation
We have a Stream<T>
So predicate is a Predicate<T>
Downstream is a Collector<T, ?, R>
stream.collect(mapping(function, downstream));
stream.collect(filtering(predicate, downstream));
Intermediate CollectorsThe mapping Collector provides an intermediate operation
We have a Stream<T>
So flatMapper is a Function<T, Stream<TT>>
And downstream is a Collector<TT, ?, R>
stream.collect(mapping(function, downstream));
stream.collect(flatMapping(flatMapper, downstream));
Demo Time
CharacteristicsThree characteristics for the collectors:
- IDENTITY_FINISH: the finisher is the identityfunction
- UNORDERED: the collector does not preservethe order of the elements
- CONCURRENT: the collector is thread safe
Handling Empty OptionalsTwo things:
- Make an Optional a Stream
- Remove the empty Streams with flatMap
Map<K, Optional<V>> // with empty Optionals...-> Map<K, Steam<V>> // with empty Streams-> Stream<Map.Entry<K, V>> // the empty are gone-> Map<K, V> // using a toMap
Joins1) The authors that published the most
together
2) The authors that published the mosttogether in a year
StreamsUtils to the rescue!
Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…
Gent, Hoos, Prosser, Walsh
Gent, Walsh
{Gent, Walsh}
{Gent, Hoos} {Gent, Prosser} {Gent, Walsh}{Hoos, Prosser} {Hoos, Walsh}{Prosser, Walsh}
flatMap()
Demo Time
Application What is interesting in modeling a processing as a collector?
We can reuse this collector as a downstreamcollector for other processings
What About Readability?Creating composable Collectors
Demo Time
Dealing with IssuesThe main issue is the empty stream
A whole stream may have elements
But when we build an histogram, a givensubstream may become empty…
Conclusion
API CollectorA very rich API indeed
Quite complex…
One needs to have a very precise idea of the data processing pipeline
Can be extended!
API CollectorA collector can model a whole processing
Once it is written, it can be passed as a downstream to another processing pipeline
Can be made composable to improvereadability
https://github.com/JosePaumard
Thank you for yourattention!
Questions?
@JosePaumard
https://github.com/JosePaumard
https://www.slideshare.net/jpaumard
https://www.youtube.com/user/JPaumard