Collectors in the Wild

Post on 21-Jan-2018

344 views 2 download

transcript

Collectors in theWild@JosePaumard

Collectors?Why should we be interested in collectors?

▪ They are part of the Stream API

▪ And kind of left aside…

Collectors?YouTube:

▪ Stream tutorials ~700k

▪ Collectors tutorials < 5k

Collectors?Why should we be interested in collectors?

▪ They are part of the Stream API

▪ And kind of left aside…

And it’s a pity because it is a very powerful API

@JosePaumard

Microsoft Virtual Academy

Questions?#ColJ8

movies.stream().flatMap(movie -> movie.actors().stream()).collect(

Collectors.groupingBy(Function.identity(), Collectors.counting()

)).entrySet().stream().max(Map.Entry.comparingByValue()).get();

movies.stream().collect(

Collectors.groupingBy(movie -> movie.releaseYear(),

Collector.of(() -> new HashMap<Actor, AtomicLong>(), (map, movie) -> {

movie.actors().forEach(actor -> map.computeIfAbsent(actor, a -> new AtomicLong()).incrementAndGet()

) ;},(map1, map2) -> {

map2.entrySet().stream().forEach(entry -> map1.computeIfAbsent(entry.getKey(), a -> new AtomicLong()).addAndGet(entry.getValue().get())

) ;return map1 ;

}, new Collector.Characteristics [] {

Collector.Characteristics.CONCURRENT.CONCURRENT}

))

).entrySet().stream().collect(

Collectors.toMap(entry5 -> entry5.getKey(),entry5 -> entry5.getValue()

.entrySet().stream()

.max(Map.Entry.comparingByValue(Comparator.comparing(l -> l.get())))

.get())

).entrySet().stream().max(Comparator.comparing(entry -> entry.getValue().getValue().get())).get();

Do not give bugs a place to hide!

Brian Goetz

Collectors?Why should we be interested in collectors?

▪ They are part of the Stream API

▪ And kind of left aside…

And it’s a pity because it is a very powerful API

▪ Even if we can also write unreadable code with it!

AgendaQuick overview about streams

About collectors

Extending existing collectors

Making a collector readable

Creating new collectors

Composing Collectors

A Few Words on Streams

About StreamsA Stream:

▪ Is an object that connects to a source

▪ Has intermediate & terminal operations

▪ Some of the terminal operations can be collectors

▪ A collector can take more collectors as parameters

A Stream is…An object that connects to a source of data and watch them flow

There is no data « in » a stream ≠ collection

stream

About StreamsOn a stream:

▪ Any operation can be modeled with a collector

▪ Why is it interesting?

stream.collect(collector);

Intermediate Operations

stream

1st operation: mapping = changing the type

stream

2nd operation: filtering = removing some objects

3rd operation: flattening

stream

stream

3rd operation: flattening

Map, Filter, FlatMapThree operations that do not need any buffer to work

Not the case of all the operations…

Sorting elements using a comparator

The stream needs to see all the elementsbefore beginning to transmit them

stream

stream

Distinct

The Stream needs to remember all the elements before transmitting them (or not)

Distinct, sortedBoth operations need a buffer to store all the elements from the source

Intermediate Operations2 categories:

- Stateless operations = do not need to remember anything

- Stateful operations = do need a buffer

Limit and SkipTwo methods that rely on the order of the elements:

- Limit = keeps the n first elements

- Skip = skips the n first elements

Needs to keep track of the index of the elements and to process them in order

Terminal Operations

Intermediate vs TerminalOnly a terminal operation triggers the consuming of the data from the source

movies.stream().filter(movie -> movie.releaseYear() == 2007).flatMap(movie -> movie.actors().stream()).map(movie -> movie.getTitle());

Intermediate vs TerminalOnly a terminal operation triggers the consuming of the data from the source

movies.stream().filter(movie -> movie.releaseYear() == 2007).flatMap(movie -> movie.actors().stream()).map(movie -> movie.getTitle()).forEach(movie -> System.out.println(movie.getTitle()));

Terminal OperationsFirst batch:

- forEach

- count

- max, min

- reduce

- toArray

Terminal OperationsFirst batch:

- forEach

- count

- max, min

- reduce

- toArray

Will consume all the data

Terminal OperationsSecond Batch:

- allMatch

- anyMatch

- noneMatch

- findFirst

- findAny

Terminal OperationsSecond Batch:

- allMatch

- anyMatch

- noneMatch

- findFirst

- findAny

Do not need to consume all the data = short-circuit operations

Terminal OperationsSpecial cases:

- max

- min

- reduce

Returns an Optional (to handle empty streams)

https://www.youtube.com/watch?v=Ej0sss6cq14@StuartMarks

A First CollectorAnd then there is collect!

The most seen:

Takes a collector as a parameter

List<String> result = strings.stream()

.filter(s -> s.itEmpty())

.collect(Collectors.toList());

A First Collector (bis)And then there is collect!

The most seen:

Takes a collector as a parameter

Set<String> result = strings.stream()

.filter(s -> s.itEmpty())

.collect(Collectors.toSet());

A Second CollectorAnd then there is collect!

Maybe less known?:

Takes a collector as a parameter

String authors = authors.stream()

.map(Author::getName)

.collect(Collectors.joining(", "));

Demo Time

A Third CollectorCreating a Map

Map<Integer, List<String>> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(Collectors.groupingBy(

s -> s.length())

);

3

4

5

one, two, three, four, five, six, seven, eight, nine, ten

one, two, six, ten

four, five, nine

three, seven, eight

groupingBy(String::length)

Map<Integer, List<String>>

3

4

5

one, two, three, four, five, six, seven, eight, nine, ten

one, two, six, ten

four, five, nine

three, seven, eight

groupingBy(String::length, downstream)

.stream().collect(downstream)

.stream().collect(downstream)

.stream().collect(downstream)

3

4

5

one, two, three, four, five, six, seven, eight, nine, ten

one, two, six, ten

four, five, nine

three, seven, eight

groupingBy(String::length, Collectors.counting())

4L

3L

3L

Map<Integer, Long>

A Third Collector (bis)Creating a Map

Map<Integer, Long> result = strings.stream()

.filter(s -> s.itEmpty())

.collect(Collectors.groupingBy(

s -> s.length(), Collectors.counting())

);

Demo Time

A Collector that CountsNumber of articles per author

Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…

A1 A2

Gent

Walsh

Gent

Hoos

Prosser

Walsh

flatMap(Article::getAuthors)

Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…

Gent, Walsh, Gent, Hoos, Prosser, Walsh

flatMap(Article::getAuthors)

Gent

Walsh

Hoos

2L

2L

1L

Prosser 1L

groupingBy(

)

groupingBy(identity(),counting()

)

groupingBy(identity(),

)

Demo Time

Supply, Accumulate and Combine

Creating ListsA closer look at that code:

List<String> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(Collectors.toList());

stream a b b

collector1) Build the list2) Add elements one

by one

a b c

ArrayList

Creating Lists1) Building the list: supplier

2) Adding an element to that list: accumulator

Supplier<List> supplier = () -> new ArrayList();

BiConsumer<List<E>, E> accumulator = (list, e) -> list.add(e);

In parallel

Stream

Collector

collector1) Build a list2) Add elements one

by one3) Merge the lists

CPU 2

Stream

CollectorCPU 1

Creating Lists1) Building the list: supplier

2) Adding an element to that list: accumulator

3) Combining two lists

Supplier<List> supplier = ArrayList::new;

BiConsumer<List<E>, E> accumulator = List::add;

BiConsumer<List<E>, List<E>> combiner = List::addAll;

Creating ListsSo we have:

List<String> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(ArrayList::new,List::add, List::adAll);

Creating ListsSo we have:

List<String> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(ArrayList::new,Collection::add, Collection::adAll);

Creating SetsAlmost the same:

Set<String> result = strings.stream()

.filter(s -> !s.isEmpty())

.collect(HashSet::new,Collection::add, Collection::adAll);

String ConcatenationNow we need to create a String by concatenating the elements using a separator:

« one, two, six »

Works with Streams of Strings

String ConcatenationLet us collect

strings.stream().filter(s -> s.length() == 3).collect(() -> new String(),

(finalString, s) -> finalString.concat(s), (s1, s2) -> s1.concat(s2));

String ConcatenationLet us collect

strings.stream().filter(s -> s.length() == 3).collect(() -> new String(),

(finalString, s) -> finalString.concat(s), (s1, s2) -> s1.concat(s2));

String ConcatenationLet us collect

strings.stream().filter(s -> s.length() == 3).collect(() -> new StringBuilder(),

(sb, s) -> sb.append(s), (sb1, sb2) -> sb1.append(sb2));

String ConcatenationLet us collect

strings.stream().filter(s -> s.length() == 3).collect(StringBuilder::new,

StringBuilder::append, StringBuilder::append);

String ConcatenationLet us collect

StringBuilder stringBuilder = strings.stream()

.filter(s -> s.length() == 3)

.collect(StringBuilder::new,StringBuilder::append, StringBuilder::append);

String ConcatenationLet us collect

String string = strings.stream()

.filter(s -> s.length() == 3)

.collect(StringBuilder::new,StringBuilder::append, StringBuilder::append)

.toString();

A Collector is…3 Operations

- Supplier: creates the mutable container

- Accumulator

- Combiner

A Collector is…3 + 1 Operations

- Supplier: creates the mutable container

- Accumulator

- Combiner

- Finisher, that can be the identity function

Collecting and ThenAnd we have a collector for that!

strings.stream().filter(s -> s.length() == 3).collect(

Collectors.collectingAndThen(collector, finisher // Function

));

Demo Time

7634L {2004, 7634L}

Map<Long, List<Entry<Integer, Long>>>

7634L {2004, 7634L}

Map<Long, List<Entry<Integer, Long>>>

Entry<Integer, Long> -> Integer = mapping

7634L {2004, 7634L}

Map<Long, List<Entry<Integer, Long>>>

Entry<Integer, Long> -> Integer = mapping

Function<> mapper = entry -> entry.getKey();

Collectors.mapping(mapper, toList());

Demo Time

Collect toMapUseful for remapping maps

Do not generate duplicate keys!

map.entrySet().stream().collect(

Collectors.toMap(entry -> entry.getKey(), entry -> // create a new value

));

Custom Collectors:1) Filter, Flat Map2) Joins3) Composition

Coffee break!

About Types

The Collector Interfacepublic interface Collector<T, A, R> {

public Supplier<A> supplier(); // A: mutable container

public BiConsumer<A, T> accumulator(); // T: processed elments

public BinaryOperator<A> combiner(); // Often the type returned

public Function<A, R> finisher(); // Final touch

}

The Collector Interfacepublic interface Collector<T, A, R> {

public Supplier<A> supplier(); // A: mutable container

public BiConsumer<A, T> accumulator(); // T: processed elments

public BinaryOperator<A> combiner(); // Often the type returned

public Function<A, R> finisher(); // Final touch

public Set<Characteristics> characteristics();}

Type of a CollectorIn a nutshell:

- T: type of the elements of the stream

- A: type the mutable container

- R: type of the final container

We often have A = R

The finisher may be the identity function≠

one, two, three, four, five, six, seven, eight, nine, ten

groupingBy(String::length)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(

String::length,?

)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(

String::length,Collector<String, ?, >

)

3

4

5

one, two, six, ten

four, five, nine

three, seven, eight

one, two, three, four, five, six, seven, eight, nine, ten

Collector<String, ?, Map<Integer, Value>> c = groupingBy(

String::length,Collector<String, ?, Value>

)

counting() : Collector<T, ?, Long>

3

4

5

4L

3L

3L

Intermediate Operations

Intermediate CollectorsBack to the mapping collector

This collector takes a downstream collectorstream.collect(mapping(function, downstream));

Intermediate CollectorsThe mapping Collector provides an intermediate operation

stream.collect(mapping(function, downstream));

Intermediate CollectorsThe mapping Collector provides an intermediate operation

Why is it interesting?

To create downstream collectors!

So what about integrating all our streamprocessing as a collector?

stream.collect(mapping(function, downstream));

Intermediate CollectorsIf collectors can map, why would’nt they filter, or flatMap?

…in fact they can in 9 ☺

Intermediate CollectorsThe mapping Collector provides an intermediate operation

We have a Stream<T>

So predicate is a Predicate<T>

Downstream is a Collector<T, ?, R>

stream.collect(mapping(function, downstream));

stream.collect(filtering(predicate, downstream));

Intermediate CollectorsThe mapping Collector provides an intermediate operation

We have a Stream<T>

So flatMapper is a Function<T, Stream<TT>>

And downstream is a Collector<TT, ?, R>

stream.collect(mapping(function, downstream));

stream.collect(flatMapping(flatMapper, downstream));

Demo Time

CharacteristicsThree characteristics for the collectors:

- IDENTITY_FINISH: the finisher is the identityfunction

- UNORDERED: the collector does not preservethe order of the elements

- CONCURRENT: the collector is thread safe

Handling Empty OptionalsTwo things:

- Make an Optional a Stream

- Remove the empty Streams with flatMap

Map<K, Optional<V>> // with empty Optionals...-> Map<K, Steam<V>> // with empty Streams-> Stream<Map.Entry<K, V>> // the empty are gone-> Map<K, V> // using a toMap

Joins1) The authors that published the most

together

2) The authors that published the mosttogether in a year

StreamsUtils to the rescue!

Gent & Walsh, Beyond NP: The QSAT Phase TransitionGent & Hoos & Prosser & Walsh, Morphing: Combining…

Gent, Hoos, Prosser, Walsh

Gent, Walsh

{Gent, Walsh}

{Gent, Hoos} {Gent, Prosser} {Gent, Walsh}{Hoos, Prosser} {Hoos, Walsh}{Prosser, Walsh}

flatMap()

Demo Time

Application What is interesting in modeling a processing as a collector?

We can reuse this collector as a downstreamcollector for other processings

What About Readability?Creating composable Collectors

Demo Time

Dealing with IssuesThe main issue is the empty stream

A whole stream may have elements

But when we build an histogram, a givensubstream may become empty…

Conclusion

API CollectorA very rich API indeed

Quite complex…

One needs to have a very precise idea of the data processing pipeline

Can be extended!

API CollectorA collector can model a whole processing

Once it is written, it can be passed as a downstream to another processing pipeline

Can be made composable to improvereadability

https://github.com/JosePaumard

Thank you for yourattention!

Questions?

@JosePaumard

https://github.com/JosePaumard

https://www.slideshare.net/jpaumard

https://www.youtube.com/user/JPaumard