+ All Categories
Home > Data & Analytics > Spark Tuning for Enterprise System Administrators

Spark Tuning for Enterprise System Administrators

Date post: 16-Apr-2017
Category:
Upload: alpine-data
View: 146 times
Download: 3 times
Share this document with a friend
59
Spark Tuning for Enterprise System Administrators Anya T. Bida, PhD Rachel B. Warren
Transcript
Page 1: Spark Tuning for Enterprise System Administrators

Spark Tuning for Enterprise System Administrators

Anya T. Bida, PhD Rachel B. Warren

Page 2: Spark Tuning for Enterprise System Administrators

Don't worry about missing something...

Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be Presentation: http://www.slideshare.net/anyabida Cheat-sheet: http://techsuppdiva.github.io/ !!Anya: https://www.linkedin.com/in/anyabida Rachel: https://www.linkedin.com/in/rachelbwarren !! !2

Page 3: Spark Tuning for Enterprise System Administrators

About Anya About RachelOperations Engineer !!!

Spark & Scala Enthusiast / Data Engineer

Alpine Data!alpinenow.com

Page 4: Spark Tuning for Enterprise System Administrators

About You*

Intermittent

Reliable Optimal

Spark practitioners

mySparkApp Success

*

Page 5: Spark Tuning for Enterprise System Administrators

Intermittent Reliable

Optimal

mySparkApp Success

Page 6: Spark Tuning for Enterprise System Administrators

Default != RecommendedExample: By default, spark.executor.memory = 1g 1g allows small jobs to finish out of the box. Spark assumes you'll increase this parameter.

!6

Page 7: Spark Tuning for Enterprise System Administrators

Which parameters are important? !

How do I configure them?

!7

Default != Recommended

Page 8: Spark Tuning for Enterprise System Administrators

Filter* data before an

expensive reduce or aggregation

consider* coalesce(

Use* data structures that

require less memory

Serialize*

PySpark

serializing is built-in

Scala/Java?

persist(storageLevel.[*]_SER)

Recommended: kryoserializer *

tuning.html#tuning-data-structures

See "Optimize partitions." *

See "GC investigation." *

See "Checkpointing." *

The Spark Tuning Cheat-Sheet

Page 9: Spark Tuning for Enterprise System Administrators

Intermittent Reliable

Optimal

mySparkApp Success

Memory trouble

Initial config

Page 10: Spark Tuning for Enterprise System Administrators

Intermittent Reliable

Optimal

mySparkApp Success

Memory trouble

Initial config

Page 11: Spark Tuning for Enterprise System Administrators

!11

How many in the audience have their own

cluster?

Page 12: Spark Tuning for Enterprise System Administrators

!12

Page 13: Spark Tuning for Enterprise System Administrators

Fair Schedulers

!13

YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>

SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>

Page 14: Spark Tuning for Enterprise System Administrators

Fair Schedulers

!14

YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>

SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>

Page 15: Spark Tuning for Enterprise System Administrators

Fair Schedulers

!15

YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>

SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>

Page 16: Spark Tuning for Enterprise System Administrators

Fair Schedulers

!16

YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>

SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>

Page 17: Spark Tuning for Enterprise System Administrators

Fair Schedulers

!17

YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>

SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>

Use these parameters!

Page 18: Spark Tuning for Enterprise System Administrators

Fair Schedulers

!18

YARN <allocations> <user name="sample_user"> <maxRunningApps>6</maxRunningApps> </user> <userMaxAppsDefault>5</userMaxAppsDefault> !</allocations>

Page 19: Spark Tuning for Enterprise System Administrators

Fair Schedulers

!19

YARN <allocations> <user name="sample_user"> <maxRunningApps>6</maxRunningApps> </user> <userMaxAppsDefault>5</userMaxAppsDefault> !</allocations>

Page 20: Spark Tuning for Enterprise System Administrators

What is the memory limit for mySparkApp?

!20

Page 21: Spark Tuning for Enterprise System Administrators

!21

Driver

Executor

Cluster Manager

Sidebar: Spark Architecture

Mark Grover: http://www.slideshare.net/SparkSummit/top-5-mistakes-when-writing-spark-applications-by-mark-grover-and-ted-malaska

Executor

Page 22: Spark Tuning for Enterprise System Administrators

!22

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!

What is the memory limit for mySparkApp?

Page 23: Spark Tuning for Enterprise System Administrators

!23

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!

What is the memory limit for mySparkApp?

Page 24: Spark Tuning for Enterprise System Administrators

!24

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!

<maxResources>___mb</maxResources>

Limitation

What is the memory limit for mySparkApp?

Page 25: Spark Tuning for Enterprise System Administrators

What is the memory limit for mySparkApp?

!25

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!

Reserve 25% for overhead

Page 26: Spark Tuning for Enterprise System Administrators

!26

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!

What is the memory limit for mySparkApp?

Page 27: Spark Tuning for Enterprise System Administrators

!27

Page 28: Spark Tuning for Enterprise System Administrators

!28

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !

mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)

What is the memory limit for mySparkApp?

Page 29: Spark Tuning for Enterprise System Administrators

!29

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !

mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)

What is the memory limit for mySparkApp?

Page 30: Spark Tuning for Enterprise System Administrators

!30

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !

mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)

What is the memory limit for mySparkApp?

Limitation: Driver must not be larger than a single node.

Page 31: Spark Tuning for Enterprise System Administrators

!31

yarn.nodemanager.resource.memory-mb

Driver Container

spark.driver.memory

Page 32: Spark Tuning for Enterprise System Administrators

!32

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !

mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)

What is the memory limit for mySparkApp?

Page 33: Spark Tuning for Enterprise System Administrators

!33

Driver

Executor

Cluster Manager

Sidebar: Spark Architecture

Mark Grover: http://www.slideshare.net/SparkSummit/top-5-mistakes-when-writing-spark-applications-by-mark-grover-and-ted-malaska

Executor

Page 34: Spark Tuning for Enterprise System Administrators

!34

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !

mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)

What is the memory limit for mySparkApp?

Verify my calculations respect this limitation.

Page 35: Spark Tuning for Enterprise System Administrators

!35

Page 36: Spark Tuning for Enterprise System Administrators

Intermittent Reliable

Optimal

mySparkApp Success

Memory trouble

Initial config

Page 37: Spark Tuning for Enterprise System Administrators

Intermittent Reliable

Optimal

mySparkApp Success

Memory trouble

Initial config

Page 38: Spark Tuning for Enterprise System Administrators

mySparkApp memory issues

Page 39: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

Page 40: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

Page 41: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

here let's talk about one scenario

Page 42: Spark Tuning for Enterprise System Administrators
Page 43: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

persist(storageLevel.[*]_SER)

Page 44: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

persist(storageLevel.[*]_SER)

Page 45: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

persist(storageLevel.[*]_SER)

Recommended: kryoserializer *

Page 46: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

persist(storageLevel.[*]_SER)

Recommended: kryoserializer *

Page 47: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

Page 48: Spark Tuning for Enterprise System Administrators

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

here let's talk about one scenario

Page 49: Spark Tuning for Enterprise System Administrators
Page 50: Spark Tuning for Enterprise System Administrators

Spark 1.1-1.5, Recommendation: Increase spark.memory.storageFraction

Page 51: Spark Tuning for Enterprise System Administrators

!51Alexey Grishchenko: https://0x0fff.com/spark-memory-management/

Spark 1.1-1.5, Recommendation: Increase spark.memory.storageFraction !Spark 1.6, Recommendation: UnifiedMemoryManager

Page 52: Spark Tuning for Enterprise System Administrators

Alexey Grishchenko: https://0x0fff.com/spark-memory-management/Sandy Ryza: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

yarn.nodemanager.resource.memory-mb

spar

k.ya

rn.e

xecu

tor.m

emor

yOve

rhea

d

Executor Container

spark.executor.memory

Page 53: Spark Tuning for Enterprise System Administrators

!53

Driver

Cluster Manager

Sidebar: Spark Architecture

yarn.nodema

spar

k.ya

rn.e

Execspark.e

yarn.nodema

spar

k.ya

rn.e

Execspark.e

yarn.nodema

spar

k.ya

rn.e

Execspark.e

Executor

Executor

Page 54: Spark Tuning for Enterprise System Administrators

Intermittent Reliable

Optimal

mySparkApp Success

Memory trouble

Initial config

Page 55: Spark Tuning for Enterprise System Administrators

Intermittent Reliable

Optimal

mySparkApp Success

Memory trouble

Initial config

Instead of 2.5 hours, myApp completes in 1 hour.

Page 56: Spark Tuning for Enterprise System Administrators

Cheat-sheet techsuppdiva.github.io/

Page 57: Spark Tuning for Enterprise System Administrators

Intermittent Reliable

Optimal

mySparkApp Success

Memory trouble

Initial config

HighPerformanceSpark.com

Page 58: Spark Tuning for Enterprise System Administrators

Further Reading:• Spark Tuning Cheat-sheet

techsuppdiva.github.io

• Apache Spark Documentation https://spark.apache.org/docs/latest

• Checkpointinghttp://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointinghttps://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-rdd-checkpointing.adoc

• Learning Spark, by H. Karau, A. Konwinski, P. Wendell, M. Zaharia, 2015

!58

Page 59: Spark Tuning for Enterprise System Administrators

More Questions?

!59

Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be Presentation: http://www.slideshare.net/anyabida Cheat-sheet: http://techsuppdiva.github.io/ !!Anya: https://www.linkedin.com/in/anyabida Rachel: https://www.linkedin.com/in/rachelbwarren !! Thanks!


Recommended