+ All Categories
Home > Engineering > Top 5 mistakes when writing Spark applications

Top 5 mistakes when writing Spark applications

Date post: 21-Apr-2017
Category:
Upload: hadooparchbook
View: 2,140 times
Download: 3 times
Share this document with a friend
75
Top 5 mistakes when writing Spark applications tiny.cloudera.com/spark-mistakes Mark Grover | Software Engineer, Cloudera | @mark_grover Ted Malaska | Technical Group Architect, Blizzard| @TedMalaska
Transcript
Page 1: Top 5 mistakes when writing Spark applications

Top 5 mistakes when writing Spark applicationstiny.cloudera.com/spark-mistakes

Mark Grover | Software Engineer, Cloudera | @mark_grover

Ted Malaska | Technical Group Architect, Blizzard| @TedMalaska

Page 2: Top 5 mistakes when writing Spark applications

2

About the book

• @hadooparchbook• hadooparchitecturebook.com• github.com/hadooparchitecturebook• slideshare.com/hadooparchbook

Page 3: Top 5 mistakes when writing Spark applications

3

Mistakes people makewhen using Spark

Page 4: Top 5 mistakes when writing Spark applications

4

Mistakes people we’ve madewhen using Spark

Page 5: Top 5 mistakes when writing Spark applications

5

Mistakes people makewhen using Spark

Page 6: Top 5 mistakes when writing Spark applications

6

Mistake # 1

Page 7: Top 5 mistakes when writing Spark applications

7

# Executors, cores, memory !?!

• 6 Nodes• 16 cores each• 64 GB of RAM each

Page 8: Top 5 mistakes when writing Spark applications

8

Decisions, decisions, decisions

• Number of executors (--num-executors)• Cores for each executor (--executor-cores)• Memory for each executor (--executor-memory)

• 6 nodes• 16 cores each• 64 GB of RAM

Page 9: Top 5 mistakes when writing Spark applications

9

Spark Architecture recap

Page 10: Top 5 mistakes when writing Spark applications

10

Answer #1 – Most granular

• Have smallest sized executorspossible• 1 core each• 64GB/node / 16 executors/node= 4 GB/executor• Total of 16 cores x 6 nodes = 96 cores => 96 executors

Worker node

Executor 16

Executor 4

Executor 3

Executor 2

Executor 1

Page 11: Top 5 mistakes when writing Spark applications

11

Answer #1 – Most granular

• Have smallest sized executorspossible• 1 core each• 64GB/node / 16 executors/node= 4 GB/executor• Total of 16 cores x 6 nodes = 96 cores => 96 executors

Worker node

Executor 16

Executor 4

Executor 3

Executor 2

Executor 1

Page 12: Top 5 mistakes when writing Spark applications

12

Why?

• Not using benefits of running multiple tasks in same executor

Page 13: Top 5 mistakes when writing Spark applications

13

Answer #2 – Least granular

• 6 executors in total=>1 executor per node• 64 GB memory each• 16 cores each

Worker node

Executor 1

Page 14: Top 5 mistakes when writing Spark applications

14

Answer #2 – Least granular

• 6 executors in total=>1 executor per node• 64 GB memory each• 16 cores each

Worker node

Executor 1

Page 15: Top 5 mistakes when writing Spark applications

15

Why?

• Need to leave some memory overhead for OS/Hadoop daemons

Page 16: Top 5 mistakes when writing Spark applications

16

Answer #3 – with overhead

• 6 executors – 1 executor/node• 63 GB memory each• 15 cores each

Worker node

Executor 1

Overhead(1G,1 core)

Page 17: Top 5 mistakes when writing Spark applications

17

Answer #3 – with overhead

• 6 executors – 1 executor/node• 63 GB memory each• 15 cores each

Worker node

Executor 1

Overhead(1G,1 core)

Page 18: Top 5 mistakes when writing Spark applications

18

Let’s assume…

• You are running Spark on YARN, from here on…

Page 19: Top 5 mistakes when writing Spark applications

19

3 things

• 3 other things to keep in mind

Page 20: Top 5 mistakes when writing Spark applications

20

#1 – Memory overhead

• --executor-memory controls the heap size• Need some overhead (controlled by

spark.yarn.executor.memory.overhead) for off heap memory• Default is max(384MB, .07 * spark.executor.memory)

Page 21: Top 5 mistakes when writing Spark applications

21

#2 - YARN AM needs a core: Client mode

Page 22: Top 5 mistakes when writing Spark applications

22

#2 YARN AM needs a core: Cluster mode

Page 23: Top 5 mistakes when writing Spark applications

23

#3 HDFS Throughput

• 15 cores per executor can lead to bad HDFS I/O throughput.• Best is to keep under 5 cores per executor

Page 24: Top 5 mistakes when writing Spark applications

24

Calculations

• 5 cores per executor– For max HDFS throughput

• Cluster has 6 * 15 = 90 cores in totalafter taking out Hadoop/Yarn daemon cores)• 90 cores / 5 cores/executor= 18 executors• Each node has 3 executors• 63 GB/3 = 21 GB, 21 x (1-0.07) ~ 19 GB• 1 executor for AM => 17 executors

Overhead

Worker node

Executor 3

Executor 2

Executor 1

Page 25: Top 5 mistakes when writing Spark applications

25

Correct answer

• 17 executors in total• 19 GB memory/executor• 5 cores/executor

* Not etched in stone

Overhead

Worker node

Executor 3

Executor 2

Executor 1

Page 26: Top 5 mistakes when writing Spark applications

26

Dynamic allocation helps with though, right?

• Dynamic allocation allows Spark to dynamically scale the cluster resources allocated to your application based on the workload.

• Works with Spark-On-Yarn

Page 27: Top 5 mistakes when writing Spark applications

27

Decisions with Dynamic Allocation

• Number of executors (--num-executors)• Cores for each executor (--executor-cores)• Memory for each executor (--executor-memory)

• 6 nodes• 16 cores each• 64 GB of RAM

Page 28: Top 5 mistakes when writing Spark applications

28

Read more

• From a great blog post on this topic by Sandy Ryza:http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

Page 29: Top 5 mistakes when writing Spark applications

29

Mistake # 2

Page 30: Top 5 mistakes when writing Spark applications

30

Application failure15/04/16 14:13:03 WARN scheduler.TaskSetManager: Lost task 19.0 in stage 6.0 (TID 120, 10.215.149.47): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUEat sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:123) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:132) at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:517) at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:432) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:618) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:146) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)

Page 31: Top 5 mistakes when writing Spark applications

31

Why?

• No Spark shuffle block can be greater than 2 GB

Page 32: Top 5 mistakes when writing Spark applications

32

Ok, what’s a shuffle block again?

• In MapReduce terminology, a file written from one Mapper for a Reducer• The Reducer makes a local copy of this file (reducer local copy) and then

‘reduces’ it

Page 33: Top 5 mistakes when writing Spark applications

33

Defining shuffle and partition

Each yellow arrow in this diagram represents a shuffle block.

Each blue block is a partition.

Page 34: Top 5 mistakes when writing Spark applications

34

Once again

• Overflow exception if shuffle block size > 2 GB

Page 35: Top 5 mistakes when writing Spark applications

35

What’s going on here?

• Spark uses ByteBuffer as abstraction for blocksval buf = ByteBuffer.allocate(length.toInt)

• ByteBuffer is limited by Integer.MAX_SIZE (2 GB)!

Page 36: Top 5 mistakes when writing Spark applications

36

Spark SQL

• Especially problematic for Spark SQL• Default number of partitions to use when doing shuffles is 200

– This low number of partitions leads to high shuffle block size

Page 37: Top 5 mistakes when writing Spark applications

37

Umm, ok, so what can I do?

1. Increase the number of partitions– Thereby, reducing the average partition size

2. Get rid of skew in your data– More on that later

Page 38: Top 5 mistakes when writing Spark applications

38

Umm, how exactly?

• In Spark SQL, increase the value of spark.sql.shuffle.partitions

• In regular Spark applications, use rdd.repartition() or rdd.coalesce()(latter to reduce #partitions, if needed)

Page 39: Top 5 mistakes when writing Spark applications

39

But, how many partitions should I have?

• Rule of thumb is around 128 MB per partition

Page 40: Top 5 mistakes when writing Spark applications

40

But! There’s more!

• Spark uses a different data structure for bookkeeping during shuffles, when the number of partitions is less than 2000, vs. more than 2000.

Page 41: Top 5 mistakes when writing Spark applications

41

Don’t believe me?

• In MapStatus.scaladef apply(loc: BlockManagerId, uncompressedSizes: Array[Long]): MapStatus = {

if (uncompressedSizes.length > 2000) {HighlyCompressedMapStatus(loc, uncompressedSizes)

} else {new CompressedMapStatus(loc, uncompressedSizes)

}}

Page 42: Top 5 mistakes when writing Spark applications

42

Ok, so what are you saying?

If number of partitions < 2000, but not by much, bump it to be slightly higher than 2000.

Page 43: Top 5 mistakes when writing Spark applications

43

Can you summarize, please?

• Don’t have too big partitions– Your job will fail due to 2 GB limit

• Don’t have too few partitions– Your job will be slow, not making using of parallelism

• Rule of thumb: ~128 MB per partition• If #partitions < 2000, but close, bump to just > 2000• Track SPARK-6235 for removing various 2 GB limits

Page 44: Top 5 mistakes when writing Spark applications

44

Mistake # 3

Page 45: Top 5 mistakes when writing Spark applications

45

Slow jobs on Join/Shuffle

• Your dataset takes 20 seconds to run over with a map job, but take 4 hours when joined or shuffled. What wrong?

Page 46: Top 5 mistakes when writing Spark applications

46

Mistake - Skew

Single Thread

Single Thread

Single Thread

Single Thread

Single Thread

Single Thread

Single Thread

Normal

Distributed

The Holy Grail of Distributed Systems

Page 47: Top 5 mistakes when writing Spark applications

47

Mistake - Skew

Single ThreadNormal

Distributed

What about Skew, because that is a thing

Page 48: Top 5 mistakes when writing Spark applications

48

• Salting• Isolated Salting• Isolated Map Joins

Mistake – Skew : Answers

Page 49: Top 5 mistakes when writing Spark applications

49

• Normal Key: “Foo”• Salted Key: “Foo” + random.nextInt(saltFactor)

Mistake – Skew : Salting

Page 50: Top 5 mistakes when writing Spark applications

50

Managing Parallelism

Page 51: Top 5 mistakes when writing Spark applications

51

Mistake – Skew: Salting

Page 52: Top 5 mistakes when writing Spark applications

52©2014 Cloudera, Inc. All rights reserved.

Add Example Slide

Page 53: Top 5 mistakes when writing Spark applications

53

• Two Stage Aggregation– Stage one to do operations on the salted keys– Stage two to do operation access unsalted key results

Mistake – Skew : Salting

Data Source MapConvert to

Salted Key & ValueTuple

ReduceBy Salted Key

Map Convert results to

Key & ValueTuple

ReduceBy Key

Results

Page 54: Top 5 mistakes when writing Spark applications

54

• Second Stage only required for Isolated Keys

Mistake – Skew : Isolated Salting

Data Source MapConvert to

Key & ValueIsolate Key and

convert toSalted Key & Value

Tuple

ReduceBy Key & Salted

Key

Filter Isolated Keys

From Salted Keys

Map Convert results to

Key & ValueTuple

ReduceBy Key

Union to Results

Page 55: Top 5 mistakes when writing Spark applications

55

• Filter Out Isolated Keys and use Map Join/Aggregate on those

• And normal reduce on the rest of the data• This can remove a large amount of data being shuffled

Mistake – Skew : Isolated Map Join

Data Source Filter Normal Keys

From Isolated Keys

ReduceBy Normal Key

Union to Results

Map Join For Isolated

Keys

Page 56: Top 5 mistakes when writing Spark applications

56

Managing ParallelismCartesian Join

Map Task

Shuffle Tmp 1

Shuffle Tmp 2

Shuffle Tmp 3

Shuffle Tmp 4

Map Task

Shuffle Tmp 1

Shuffle Tmp 2

Shuffle Tmp 3

Shuffle Tmp 4

Map Task

Shuffle Tmp 1

Shuffle Tmp 2

Shuffle Tmp 3

Shuffle Tmp 4

ReduceTask

ReduceTask

ReduceTask

ReduceTask

Amount of Data

Amount of Data

10x100x1000x10000x100000x1000000xOr more

Page 57: Top 5 mistakes when writing Spark applications

57

Table YTable X

• How To fight Cartesian Join– Nested Structures

Managing Parallelism

A, 1

A, 2

A, 3

A, 4

A, 5

A, 6

Table XA, 1, 4

A, 2, 4

A, 3, 4

A, 1, 5

A, 2, 5

A, 3, 5

A, 1, 6

A, 2, 6

A, 3, 6

JOIN OR

Table X

A

A, 1

A, 2

A, 3

A, 4

A, 5

A, 6

Page 58: Top 5 mistakes when writing Spark applications

58

• How To fight Cartesian Join– Nested Structures

Managing Parallelism

create table nestedTable (col1 string,col2 string,col3 array< struct< col3_1: string, col3_2: string>>

val rddNested = sc.parallelize(Array(Row("a1", "b1", Seq(Row("c1_1", "c2_1"),

Row("c1_2", "c2_2"),Row("c1_3", "c2_3"))),

Row("a2", "b2", Seq(Row("c1_2", "c2_2"),Row("c1_3", "c2_3"),Row("c1_4", "c2_4")))), 2)

=

Page 59: Top 5 mistakes when writing Spark applications

59

Mistake # 4

Page 60: Top 5 mistakes when writing Spark applications

60

Out of luck?

• Do you every run out of memory?• Do you every have more then 20 stages?• Is your driver doing a lot of work?

Page 61: Top 5 mistakes when writing Spark applications

61

Mistake – DAG Management

• Shuffles are to be avoided• ReduceByKey over GroupByKey• TreeReduce over Reduce• Use Complex/Nested Types

Page 62: Top 5 mistakes when writing Spark applications

62

Mistake – DAG Management: Shuffles

• Map Side reduction, where possible• Think about partitioning/bucketing ahead of time• Do as much as possible with a single shuffle• Only send what you have to send• Avoid Skew and Cartesians

Page 63: Top 5 mistakes when writing Spark applications

63

ReduceByKey over GroupByKey• ReduceByKey can do almost anything that GroupByKeycan do• Aggregations• Windowing• Use memory• But you have more control

• ReduceByKey has a fixed limit of Memory requirements• GroupByKey is unbound and dependent on data

Page 64: Top 5 mistakes when writing Spark applications

64

TreeReduce over Reduce• TreeReduce & Reduce return some result to driver• TreeReduce does more work on the executors • While Reduce bring everything back to the driver

Partition

Partition

Partition

Partition

Driver

100%

Partition

Partition

Partition

Partition

Driver

4

25%

25%

25%

25%

Page 65: Top 5 mistakes when writing Spark applications

65

Complex Types

• Top N List• Multiple types of Aggregations• Windowing operations

• All in one pass

Page 66: Top 5 mistakes when writing Spark applications

66

Complex Types• Think outside of the box use objects to reduce by • (Make something simple)

Page 67: Top 5 mistakes when writing Spark applications

67

Mistake # 5

Page 68: Top 5 mistakes when writing Spark applications

68

Ever seen this?Exception in thread "main" java.lang.NoSuchMethodError:com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;

at org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)

atorg.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)

atorg.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)

atorg.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)

at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)at

org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)at…....

Page 69: Top 5 mistakes when writing Spark applications

69

But!

• I already included protobuf in my app’s maven dependencies?

Page 70: Top 5 mistakes when writing Spark applications

70

Ah!

• My protobuf version doesn’t match with Spark’s protobuf version!

Page 71: Top 5 mistakes when writing Spark applications

71

Shading<plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><version>2.2</version>

...<relocations><relocation><pattern>com.google.protobuf</pattern><shadedPattern>com.company.my.protobuf</shadedPattern>

</relocation></relocations>

Page 72: Top 5 mistakes when writing Spark applications

72

Future of shading

• Spark 2.0 has some libraries shaded• Gauva is fully shaded

Page 73: Top 5 mistakes when writing Spark applications

73

Summary

Page 74: Top 5 mistakes when writing Spark applications

74

5 Mistakes

• Size up your executors right• 2 GB limit on Spark shuffle blocks• Evil thing about skew and cartesians• Learn to manage your DAG, yo!• Do shady stuff, don’t let classpath leaks mess you up

Page 75: Top 5 mistakes when writing Spark applications

75

THANK YOU.tiny.cloudera.com/spark-mistakes

Mark Grover | @mark_groverTed Malaska | @TedMalaska


Recommended