+ All Categories
Home > Technology > Hadoop testing workshop - july 2013

Hadoop testing workshop - july 2013

Date post: 28-Nov-2014
Category:
Upload: ophir-cohen
View: 570 times
Download: 5 times
Share this document with a friend
Description:
Presentation from the July's 2013 workshop on how to test, monitor and profile map reduce jobs
15
Hadoop Testing Workshop Ophir Cohen Data Platform Leader, [email protected] July 2013
Transcript
Page 1: Hadoop testing workshop - july 2013

Hadoop Testing Workshop

Ophir CohenData Platform Leader,[email protected] 2013

Page 2: Hadoop testing workshop - july 2013

Agenda

1. Connection Before Content

2. Testing Fundamental

3. Unit Tests

4. Integration Tests

5. Try it out

6. Performance

7. Diagnostics

Page 3: Hadoop testing workshop - july 2013

Why Testing

1. Catch bugs early in the developing cycle

2. Transparency of current project status

3. Easy developing / refactoring: immediate feedback

4. Push developer to provide better and stable code

5. Decrease developing cycle times

Page 4: Hadoop testing workshop - july 2013

Why Automatic Testing?

It isn't real question right?

Page 5: Hadoop testing workshop - july 2013

Testing Fundamental

1. Unit testing - functional verification of each 'unit' (method /

class in Java)

2. Integration testing - verifies that the system works as a

whole

3. Performance testing - test the efficiency of the program.

Deepened by code AND cluster architecture

4. Diagnostic - the way to find problems in production.

--> 1 + 2 should be done BEFORE production

Page 6: Hadoop testing workshop - july 2013

Unit Tests

Key Features1. Simple (up to 10 lines)

2. Isolation (no DB connection, no cluster dependency etc...)

3. Deterministics - PASS or FAIL

4. Automated (of course)

Why Unit Tests1. Prevent regression

2. Fast - no need of full MR env

3. Help in refactoring and updates

Page 7: Hadoop testing workshop - july 2013

Unit Tests - MR jobs

Best Practices1. Extract the tested code into isolated method/class

2. Do not test MR framework but pure Java

3. Use the same package for tests

MRUnit1. Lib for MR unit tests

2. Apache project

3. Supports testing of mappers, reducers and full job (without full

cluster)

4. Supports counters testing (nice!)

Page 8: Hadoop testing workshop - july 2013

Unit Tests - Examples

Unit Tests Code Example

Page 9: Hadoop testing workshop - july 2013

Integration Tests - background

1. Unit tests test each unit (Mapper/Reducer), integration

test the integrated work

2. Test the integration with the framework

3. Does not limited by data volumes

Page 10: Hadoop testing workshop - july 2013

Integration Tests - tips and tricks

Tips and tricks1. Use MiniMRCluster / MiniDFSCluster for tests

2. Use Linux

3. Make dev == production

4. Use data sampling:

a. Random sampling

b. Biased sampling

5. Apache BigTop (never try that)

6. Use Cloudera CDH

Page 11: Hadoop testing workshop - july 2013

Lets play a bit

1. Checkout the code:

git clone https://github.com/ophchu/mapreduce-tutorials.git

2. Make sure you manage to run the mapper test

3. Complete the MRUnit tests for the reducer and full job

4. Play with the MiniMRCluster/MiniDFSCluster test

Page 12: Hadoop testing workshop - july 2013

Performance

Profiling (at a glance...)

1. Profile your code

2. Measure and tune what's matters to you

3. Benchmarking: micro and macro

4. Hadoop has a built-in profiler (e.g. using hprof)

Page 13: Hadoop testing workshop - july 2013

Cluster Performance

1. Terasort test hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.1.2.jar teragen 1000 /user/dataint/terasort/input

hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.1.2.jar terasort /user/dataint/terasort/input /user/dataint/terasort/output

2. MRBench - MR benchmarkinghadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench -numRuns 2 -maps 10 -reduces 10 -inputLines 100 -inputType random

3. NNBench - Name Node benchmarking

4. TestDFSIO - write and read performance

Page 14: Hadoop testing workshop - july 2013

Diagnostics

1. Check web API (http://your_server:50030/jobtracker.jsp):

a. Nodes: how many up, how many down, check slots

b. Jobs: logs, failures, exceptions

c. Counters: expected

2. Configuration:

a. check job conf (job.xml)

b. Check env conf (http://your_server:50030/conf)

3. Jobs history (http://your_server:50030/jobhistory.jsp)

4. Log dirs:

a. Job tracker (http://your_server:50030/logs/)

b. Task trakcers

Page 15: Hadoop testing workshop - july 2013

Thanks

[email protected]● @ophchu

Thanks


Recommended