+ All Categories
Home > Data & Analytics > Unraveling Hadoop Meltdown Mysteries

Unraveling Hadoop Meltdown Mysteries

Date post: 11-Aug-2014
Category:
Upload: the-hive
View: 295 times
Download: 0 times
Share this document with a friend
Description:
As powerful and flexible as Hadoop is, jobs still sometimes fail or thrash unpredictably. Pepperdata co-founder and CEO Sean Suchter, one of the first commercial users of Hadoop in the early days at Yahoo, will give real-world examples of Hadoop meltdowns complete with metrics and what we can learn from them. He'll also show how to automatically increase Hadoop cluster throughput through fine-grained job hardware usage visibility.
20
Meltdown Mysteries Sean Suchter
Transcript
Page 1: Unraveling Hadoop Meltdown Mysteries

Meltdown MysteriesSean Suchter

Page 2: Unraveling Hadoop Meltdown Mysteries

Disks are thrashing!

Page 3: Unraveling Hadoop Meltdown Mysteries
Page 4: Unraveling Hadoop Meltdown Mysteries
Page 5: Unraveling Hadoop Meltdown Mysteries
Page 6: Unraveling Hadoop Meltdown Mysteries
Page 7: Unraveling Hadoop Meltdown Mysteries
Page 8: Unraveling Hadoop Meltdown Mysteries
Page 9: Unraveling Hadoop Meltdown Mysteries

Solution

• Make job author aware of surprising behavior.

• Modify job code & settings to be nicer to disks.

Page 10: Unraveling Hadoop Meltdown Mysteries

Nodes are dying!

Page 11: Unraveling Hadoop Meltdown Mysteries

Initial diagnosis…• Nodes abruptly started swapping and

becoming non-responsive. (Required physical power cycling)

• Job submitters report “I didn’t change anything”

• Question: What’s doing this to the cluster?

Page 12: Unraveling Hadoop Meltdown Mysteries
Page 13: Unraveling Hadoop Meltdown Mysteries
Page 14: Unraveling Hadoop Meltdown Mysteries
Page 15: Unraveling Hadoop Meltdown Mysteries
Page 16: Unraveling Hadoop Meltdown Mysteries
Page 17: Unraveling Hadoop Meltdown Mysteries
Page 18: Unraveling Hadoop Meltdown Mysteries

Cause & solution• While the job didn’t change, its input data did.

• Stop that user’s jobs immediately.

• Better use of capacity scheduler virtual memory controls.

• Use Pepperdata protection to limit physical memory as well.

Page 19: Unraveling Hadoop Meltdown Mysteries

Take-away

• You see problems at the node level.

• You see the root causes at the task level.

Page 20: Unraveling Hadoop Meltdown Mysteries

Pepperdata meetup tomorrow!

• War Stories from the Hadoop Trenches

• Allen Wittenauer (Apache Hadoop committer and former LinkedIn)

• Eric Baldeschwieler (former Hortonworks CEO / CTO)

• Todd Nemet (Looker; former Altiscale, ClearStory Data, Cloudera)

• 6pm Wed 6/25

• Firehouse Brewery, 111 S Murphy, Sunnyvale

• http://www.meetup.com/pepperdata/


Recommended