+ All Categories
Home > Technology > Failing gracefully

Failing gracefully

Date post: 13-Jun-2015
Category:
Upload: takuya-ueshin
View: 2,132 times
Download: 0 times
Share this document with a friend
Description:
Spark Meetup (2014/09/12) の Aaron さんの資料をあずかったのでこちらにアップします。
Popular Tags:
49
Failing Gracefully Aaron Davidson 07/01/2014
Transcript
Page 1: Failing gracefully

Failing GracefullyAaron Davidson

07/01/2014

Page 2: Failing gracefully

What does “failure” mean for Spark?

• Spark is a cluster-compute framework targeted at analytics workloads

• Supported failure modes:– Transient errors (e.g., network, HDFS outage)–Worker machine failures

• Unsupported failure modes:– Systemic exceptions (e.g., bad code, OOMs)– Driver machine failure

Page 3: Failing gracefully

What makes a recovery model good?

• A good recovery model should:– Be simple– Consistently make progress towards

completion– Always be in use (“fail constantly”)

Page 4: Failing gracefully

Outline of this talk

• Spark architecture overview• Common failures• Special considerations for fault

tolerance

Page 5: Failing gracefully

Example program

Goal: Find number of names per “first character”

sc.textFile(“hdfs:/names”)

.map(name => (name.charAt(0), 1))

.reduceByKey(_ + _)

.collect()

うえしんさいとうえだ

Page 6: Failing gracefully

Example program

Goal: Find number of names per “first character”

sc.textFile(“hdfs:/names”)

.map(name => (name.charAt(0), 1))

.reduceByKey(_ + _)

.collect()

うえしんさいとうえだ

(う , 1)(さ , 1)(う , 1)

Page 7: Failing gracefully

Example program

Goal: Find number of names per “first character”

sc.textFile(“hdfs:/names”)

.map(name => (name.charAt(0), 1))

.reduceByKey(_ + _)

.collect()

(う , 2) (さ , 1)

うえしんさいとうえだ

(う , 1)(さ , 1)(う , 1)

Page 8: Failing gracefully

Example program

Goal: Find number of names per “first character”

sc.textFile(“hdfs:/names”)

.map(name => (name.charAt(0), 1))

.reduceByKey(_ + _)

.collect()

(う , 2) (さ , 1)

うえしんさいとうえだ

(う , 1)(さ , 1)(う , 1)

res0 = [(う ,2), (さ ,1)]

Page 9: Failing gracefully

Spark Execution Model

1. Create DAG of RDDs to represent computation

2. Create logical execution plan for DAG

3. Schedule and execute individual tasks

Page 10: Failing gracefully

Step 1: Create RDDs

sc.textFile(“hdfs:/names”)

map(name => (name.charAt(0), 1))

reduceByKey(_ + _)

collect()

Page 11: Failing gracefully

Step 1: Create RDDs

HadoopRDD

map()

reduceByKey()

collect()

Page 12: Failing gracefully

Step 2: Create execution plan

• Pipeline as much as possible• Split into “stages” based on need to

reorganize data

Stage 1 HadoopRDD

map()

reduceByKey()

collect()

うえしんさいとうえだ

( う , 2)

( う , 1)( さ , 1)( う , 1)

res0 = [(う ,2), (さ ,1)]

( さ , 1)Stage 2

Page 13: Failing gracefully

Step 3: Schedule tasks

• Split each stage into tasks• A task is data + computation• Execute all tasks within a stage

before moving on

Page 14: Failing gracefully

Step 3: Schedule tasksComputation Data

hdfs:/names/0.gz

hdfs:/names/1.gz

hdfs:/names/2.gz

Task 0

Task 1Task 2

hdfs:/names/3.gz

Stage 1HadoopRDD

map()Task 3

hdfs:/names/0.gz

Task 0

HadoopRDD

map()

hdfs:/names/1.gz

Task 1

HadoopRDD

map()

Page 15: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

/names/0.gz

HadoopRDD

map() Time

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

Page 16: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map() Time

Page 17: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

Time

Page 18: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map() Time

Page 19: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

Time

Page 20: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

Time

Page 21: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

Page 22: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Page 23: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Page 24: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Page 25: Failing gracefully

The Shuffle

Stage 1

Stage 2

HadoopRDD

map()

reduceByKey()

collect()

Page 26: Failing gracefully

The Shuffle

Stage 1

Stage 2

• Redistributes data among partitions• Hash keys into buckets

• On reduce side, build hashmap within each partitionReduce 0:{ う => 137, さ => 86, …}

Reduce 1:{ な => 144, る => 12, …}

Page 27: Failing gracefully

The Shuffle

Disk

Stage 2

Stage 1

• Pull-based, not push-based• Write intermediate files to disk

Page 28: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Reduce 0

reduceByKey

collect

Reduce 1

reduceByKey

collect

Reduce 2

reduceByKey

collect

Reduce 3

reduceByKey

collect

Page 29: Failing gracefully

When things go wrong

• Task failure• Task taking a long time• Executor failure

Page 30: Failing gracefully

Task Failure

• Task fails with exception retry it• RDDs are immutable and “stateless”,

so rerunning should have same effect– Special logic required for tasks which

write data out (atomic rename)– Statelessness not enforced by

programming modelsc.parallelize(0 until 100).map { x => val myVal = sys.prop(“foo”, 0) + x sys.prop(“foo”) = myVal myVal}

Page 31: Failing gracefully

Task Failure

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Page 32: Failing gracefully

Speculative Execeution

• Try to predict slow or failing tasks, restart task on a different machine in parallel

• Also assumes immutability and statelessness

• Enable with “spark.speculation=true”

Page 33: Failing gracefully

Speculative Execution

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Page 34: Failing gracefully

Speculative Execution

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Page 35: Failing gracefully

Speculative Execution

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Page 36: Failing gracefully

Speculative Execution

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

/names/3.gz

HadoopRDD

map()

Page 37: Failing gracefully

Speculative Execution

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

/names/3.gz

HadoopRDD

map()

Page 38: Failing gracefully

Executor Failure

• Examine tasks run on that executor:– If task from final stage, we’ve already

collected its results – don’t rerun– If task from intermediate stage, must

rerun.

• May require executing “finished” stage

Page 39: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Reduce 0

reduceByKey

collect

Reduce 1

reduceByKey

collect

Reduce 2

reduceByKey

collect

Reduce 3

reduceByKey

collect

Page 40: Failing gracefully

Step 3: Schedule tasks

/names/0.gz

/names/3.gz

HDFS

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

/names/3.gz

HadoopRDD

map()

Reduce 0

reduceByKey

collect

Reduce 1

reduceByKey

collect

Reduce 2

reduceByKey

collect

Reduce 3

reduceByKey

collect

Page 41: Failing gracefully

Step 3: Schedule tasks

/names/1.gz

/names/0.gz

HDFS

/names/2.gz

/names/3.gz

HDFS

/names/0.gz

HadoopRDD

map()

/names/1.gz

HadoopRDD

map()

/names/2.gz

HadoopRDD

map()

Time

Reduce 0

reduceByKey

collect

Reduce 1

reduceByKey

collect

Reduce 2

reduceByKey

collect

Reduce 3

reduceByKey

collect

Tasks to rerun:

/names/3.gz

HadoopRDD

map()

Completed tasks:

/names/0.gz

HadoopRDD

map()

/names/3.gz

HadoopRDD

map()

Reduce 3

reduceByKey

collect

Page 42: Failing gracefully

Other Failure Scenarios

What happens when:1. We have a large number of stages?2. Our input data is not immutable (e.g.

streaming)?3. Executors had cached data?

Page 43: Failing gracefully

1. Dealing with many stages

Problem:Executor loss causes recomputation of all non-final stages

Solution:Checkpoint whole RDD to HDFS periodically

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Stage 6

Stage 7

Page 44: Failing gracefully

1. Dealing with many stages

Problem:Executor loss causes recomputation of all non-final stages

Solution:Checkpoint whole RDD to HDFS periodically

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Stage 6

Stage 7

HDFS

Page 45: Failing gracefully

2. Dealing with lost input data

Problem:Input data is consumed when read (e.g., streaming), and re-execution is not possible.

Solution:No general solution today – either use an HDFS source or implement it yourself.Spark 1.2 roadmap includes a general solution which may trade throughput for safety.

Page 46: Failing gracefully

3. Loss of cached data

Problem:Executor loss causes cache to become incomplete.

Solution:Do nothing – a task caches data locally while it runs, causing the cache to stabilize.

Page 47: Failing gracefully

3. Loss of cached dataval file = sc.textFile(“s3n://”).cache() // 8 blocksfor (i <- 0 until 10) { file.count()}

Cache Block 0

Block 2

Block 4

Block 6

Block 0

Block 2

Block 4

Block 6

Cache Block 1

Block 3

Block 5

Block 7

Block 1

Block 3

Block 5

Block 7

i = 0

Block 0

Block 2

Block 4

Block 6

Block 1

Block 3

Block 5

Block 7

i = 1

Page 48: Failing gracefully

3. Loss of cached dataval file = sc.textFile(“s3n://”).cache()for (i <- 0 until 10) { file.count()}

Cache

Cache Block 1

Block 3

Block 5

Block 7

Block 1

Block 3

Block 5

Block 7

Block 1

Block 3

Block 5

Block 7

i = 1i = 0

Block 0

Block 1

Block 3

Block 5

Block 2

Block 4

Block 6

Block 7

Block 0

Block 2

Block 4

Block 6

Block 7

i = 2

Block 0

Block 1

Block 3

Block 5

Block 2

Block 4

Block 6

Block 7

i = 3

Page 49: Failing gracefully

Conclusions

• Spark comes equipped to handle the most common forms of failure

• Special care must be taken in certain cases:– Highly iterative use-cases

(checkpointing)– Streaming (atomic data consumption)– Violating Spark’s core immutability and

statelessness assumptions


Recommended