+ All Categories
Home > Documents > Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist...

Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist...

Date post: 02-Apr-2018
Category:
Upload: vuongliem
View: 263 times
Download: 3 times
Share this document with a friend
31
1 © Cloudera, Inc. All rights reserved. Docker on Hadoop Daniel Templeton | Hadoop Commiter @ Cloudera
Transcript
Page 1: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

1© Cloudera, Inc. All rights reserved.

Docker on HadoopDaniel Templeton | Hadoop Commiter @ Cloudera

Page 2: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

2© Cloudera, Inc. All rights reserved.

Me

Page 3: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

3© Cloudera, Inc. All rights reserved.

One Slide on Docker● Same general idea as a VM

● BUT there’s only one OS image

● Parttoned process space

● Layered images

● Image repo

Page 4: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

4© Cloudera, Inc. All rights reserved.

One Slide on Hadoop● Three core components

– HDFS

– YARN

– MapReduce

HDFS

MapReduce v1 YARN

HDFS

MapReduce v2

YARN

HDFS

MRv2 Spark ...

Page 5: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

5© Cloudera, Inc. All rights reserved.

Why Docker on Hadoop?● Process isolaton

– CGroups for resource isolaton

– Adds process

● Environment isolaton

– Control executon environment

• Libraries

• JVM

• OS

– Unsafe operatons

Page 6: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

6© Cloudera, Inc. All rights reserved.

YARN

Launching Jobs

NodeManagerResource

ManagerContainerExecutor

Process

Page 7: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

7© Cloudera, Inc. All rights reserved.

Container Executor● DefaultContainerExecutor

– Write a launch script– ProcessBuilder.start()–

● LinuxContainerExecutor– Write a launch script– Launch native handler

● Set UID● CGroups● Fork & exec

– Required for secure

Page 8: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

8© Cloudera, Inc. All rights reserved.

Container Executor● DefaultContainerExecutor

– Write a launch script– ProcessBuilder.start()–

● LinuxContainerExecutor– Write a launch script– Launch native handler

● Set UID● CGroups● Fork & exec

– Required for secure

● DockerContainerExecutor– Write a launch script– ProcessBuilder.start()– Docker run

Page 9: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

9© Cloudera, Inc. All rights reserved.

Container Executor● DefaultContainerExecutor

– Write a launch script– ProcessBuilder.start()–

● LinuxContainerExecutor– Write a launch script– Launch native handler

• OR– Launch Docker

handler• docker run

– Required for secure

● DockerContainerExecutor– Write a launch script– ProcessBuilder.start()– Docker run

Page 10: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

10© Cloudera, Inc. All rights reserved.

Container Executor● DefaultContainerExecutor

– Write a launch script– ProcessBuilder.start()–

● LinuxContainerExecutor– Write a launch script– Launch native handler

• OR– Launch Docker

handler• docker run

– Required for secure

● DockerContainerExecutor– Write a launch script– ProcessBuilder.start()– Docker run

BBORNORN

2.6.02.6.0DDIEDIED

2.8.02.8.0

Page 11: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

11© Cloudera, Inc. All rights reserved.

Secret FormulaHow to run a Docker container through YARN

1. Setup LCE2. Setup Docker3. Confgure yarn-site.xml4. Confgure container-executor.cfg5. Prepare Docker image6. Launch job

Page 12: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

12© Cloudera, Inc. All rights reserved.

Setup LCE● LCE uses container-executor binary

– Must be owned by root

– Group must be same as node manager's group

– Must have setuid and setgid bits set

– Must be r+x only by the node manager's group

– Owner: root, Group: hadoop, Mode: 6050

● Which relies on container-executor.cfg

– Must not be writable by any other than root

Page 13: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

13© Cloudera, Inc. All rights reserved.

Setup Docker● Docker must be installed on all node manager nodes

● ( OR node labels can be used to label the Docker nodes )

– Only capacity scheduler

– Only one label per host

● May be a good idea to pre-cache images that will be used

Page 14: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

14© Cloudera, Inc. All rights reserved.

Confgure yarn-site.xml● yarn.nodemanager.container-executor.class =

– org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor

● yarn.nodemanager.linux-container-executor.group =

– hadoop (or whatever group the node manager uses)

● yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users =

– false (typically)

● yarn.nodemanager.runtme.linux.docker.allowed-container-networks

● yarn.nodemanager.runtme.linux.docker.default-container-network

● yarn.nodemanager.runtme.linux.docker.privileged-containers.allowed

● yarn.nodemanager.runtme.linux.docker.privileged-containers.acl

● ...

Page 15: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

15© Cloudera, Inc. All rights reserved.

Confgure container-executor.cfg● yarn.nodemanager.linux-container-executor.group =

– hadoop (or whatever group the node manager uses)

● feature.docker.enabled =

– 1 (i.e. true)

● min.user.id

● banned.users

● allowed.system.users

● docker.binary

● ...

Page 16: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

16© Cloudera, Inc. All rights reserved.

Prepare the Docker Image● Applicaton owner (UID) must exist

● Executon requirements

– Hadoop → JRE, Hadoop libraries, env vars

– Must be compatble with cluster and other images

● No entry point, no command

Page 17: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

17© Cloudera, Inc. All rights reserved.

Launch the Job● Do whatever you normally do

● Use of Docker containers managed through env vars

– YARN_CONTAINER_RUNTIME_TYPE

– YARN_CONTAINER_RUNTIME_DOCKER_IMAGE

– YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE

– YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK

– YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER

– YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS

Page 18: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

18© Cloudera, Inc. All rights reserved.

Example: MapReduce$ vars="YARN_CONTAINER_RUNTIME_TYPE=docker”

$ vars=”$vars,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop"

$ hadoop jar hadoop-examples.jar pi \

-Dyarn.app.mapreduce.am.env=$vars \

-Dmapreduce.map.env=$vars \

-Dmapreduce.reduce.env=$vars \

10 100

Page 19: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

19© Cloudera, Inc. All rights reserved.

Example: Spark$ spark-shell --master yarn \

--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \

--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop \

--conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop \

--conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker

Page 20: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

20© Cloudera, Inc. All rights reserved.

Caveats

Page 21: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

21© Cloudera, Inc. All rights reserved.

Caveats● Applicaton owner must exist in Docker container

– Limits fexibility of containers

– Automatcally mounts in /etc/passwd

• Bad soluton

• Broken

• Removed in Hadoop 2.9/3.0 (YARN-5394)

– Discussion on YARN-5360 and YARN-4266

Page 22: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

22© Cloudera, Inc. All rights reserved.

Caveats● Applicaton owner must exist in Docker container● Hadoop artfacts must exist in Docker containers

– Docker containers must be self-contained

– HDFS access, deserializing tokens, etc.

– Versions must be compatble

– Complicates cluster upgrades

– YARN-5534 will allow whitelisted volume mounts

Page 23: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

23© Cloudera, Inc. All rights reserved.

Caveats● Applicaton owner must exist in Docker container● Hadoop artfacts must exist in Docker containers● Large images may fail

– Images that aren't cached are implicitly pulled

– Large images may take a while

– MapReduce and Spark tme out afer 10 minutes

– YARN-3854 is a step towards a soluton

Page 24: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

24© Cloudera, Inc. All rights reserved.

Caveats● Applicaton owner must exist in Docker container● Hadoop artfacts must exist in Docker containers● Large images may fail● No real support for secure image repos

– Docker stores credentals in client confg

– Always set to $HOME/.docker/confg.json

– YARN-5428 will make the client confg confgurable

Page 25: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

25© Cloudera, Inc. All rights reserved.

Caveats● Applicaton owner must exist in Docker container● Hadoop artfacts must exist in Docker containers● Large images may fail● No real support for secure image repos● Basic support for networks

– Containers can request any confgured network

– No port mapping

– No pods

– No management of overlay networks

Page 26: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

26© Cloudera, Inc. All rights reserved.

Caveats● Applicaton owner must exist in Docker container● Hadoop artfacts must exist in Docker containers● Large images may fail● No real support for secure image repos● Basic support for networks● Security implicatons

– Privileged container executon– Setuid binary– Volume mounts (when YARN-3384 is complete)

Page 27: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

27© Cloudera, Inc. All rights reserved.

Caveats● Applicaton owner must exist in Docker container● Hadoop artfacts must exist in Docker containers● Large images may fail● No real support for secure image repos● Basic support for networks● Security Implicatons● Not really useful before Hadoop 2.9/3.0

– YARN-5298: Mounts localized fle directories as volumes– YARN-4553: CGroups support– YARN-4007: Support diferent networking optons– YARN-5258: Documentaton

Page 28: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

28© Cloudera, Inc. All rights reserved.

Apache Slider● YARN is traditonally a job scheduler

● What about services?

● Slider simplifes running a service on YARN

– Is itself a YARN applicaton

– Declaratve

● Docker support as of Slider 0.80

– Slider agent calls docker run

– Unrelated to YARN Docker support

Page 29: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

29© Cloudera, Inc. All rights reserved.

Slider in YARN● Slider core moving into YARN

– YARN-5079: Natve YARN framework layer for services and beyond

● Slider agent is not being integrated

– Using YARN instead

– Docker support through YARN

● Currently only in yarn-natve-services branch

– Merge date not set yet

● “Classic” Slider will contnue to be available

Page 30: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

30© Cloudera, Inc. All rights reserved.

Summary● Docker adds good things to YARN

– There are a few thorns

● YARN natvely supports Docker

– Limited use untl Hadoop 2.9/3.0

● Slider natvely supports Docker

– Slider is moving into YARN and adoptng YARN's Docker support

htps://aajisaka.github.io/hadoop-project/hadoop-yarn/hadoop-yarn-site/DockerContainers.html

Page 31: Docker on Hadoop - events.static.linuxfound.org Docker Docker must be ... Hadoop artfacts must exist in Docker containers

31© Cloudera, Inc. All rights reserved.

Thank youDaniel Templeton Cloudera, [email protected] @templedf


Recommended