+ All Categories
Home > Documents > Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera,...

Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera,...

Date post: 13-Mar-2020
Category:
Upload: others
View: 7 times
Download: 2 times
Share this document with a friend
126
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera. Cloudera Administrator Training for Apache Hadoop: Hands-On Exercises General Notes ............................................................................................................................3 Setup Activity: Configuring Networking...........................................................................7 Hands-On Exercise: Installing Cloudera Manager Server ....................................... 13 Hands-on Exercise: Creating a Hadoop Cluster .......................................................... 20 Hands-On Exercise: Working With HDFS ...................................................................... 33 Hands-On Exercise: Running YARN Applications....................................................... 38 Hands-On Exercise: Explore Hadoop Configurations and Daemon Logs ........... 49 Hands-On Exercise: Using Flume to Put Data into HDFS.......................................... 55 Hands-On Exercise: Importing Data with Sqoop ........................................................ 63 Hands-On Exercise: Querying HDFS With Hive and Cloudera Impala ................. 68 Hands-On Exercise: Using Hue to Control Hadoop User Access ............................ 78 Hands-On Exercise: Configuring HDFS High Availability......................................... 91 Hands-On Exercise: Using the Fair Scheduler .......................................................... 101 Hands-On Exercise: Breaking The Cluster................................................................. 107 Hands-On Exercise: Verifying The Cluster’s Self-Healing Features .................. 109 Hands-On Exercise: Taking HDFS Snapshots ............................................................ 112 Hands-On Exercise: Configuring Email Alerts .......................................................... 114 Troubleshooting Challenge: Heap O’ Trouble .......................................................... 116 201510
Transcript
Page 1: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

Cloudera Administrator Training for Apache Hadoop: Hands-On Exercises

GeneralNotes............................................................................................................................3SetupActivity:ConfiguringNetworking...........................................................................7

Hands-OnExercise:InstallingClouderaManagerServer.......................................13Hands-onExercise:CreatingaHadoopCluster..........................................................20

Hands-OnExercise:WorkingWithHDFS......................................................................33

Hands-OnExercise:RunningYARNApplications.......................................................38Hands-OnExercise:ExploreHadoopConfigurationsandDaemonLogs...........49

Hands-OnExercise:UsingFlumetoPutDataintoHDFS..........................................55

Hands-OnExercise:ImportingDatawithSqoop........................................................63Hands-OnExercise:QueryingHDFSWithHiveandClouderaImpala.................68

Hands-OnExercise:UsingHuetoControlHadoopUserAccess............................78Hands-OnExercise:ConfiguringHDFSHighAvailability.........................................91

Hands-OnExercise:UsingtheFairScheduler..........................................................101

Hands-OnExercise:BreakingTheCluster.................................................................107Hands-OnExercise:VerifyingTheCluster’sSelf-HealingFeatures..................109

Hands-OnExercise:TakingHDFSSnapshots............................................................112Hands-OnExercise:ConfiguringEmailAlerts..........................................................114

TroubleshootingChallenge:HeapO’Trouble..........................................................116

201510

Page 2: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

2

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

2

AppendixA:SettingupVMwareFusiononaMacfortheCloudTrainingEnvironment........................................................................................................................120AppendixB:SettingupVirtualBoxfortheCloudTrainingEnvironment.......124

Page 3: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

3

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

3

General Notes

Training Environment Overview

Inthistrainingcourse,youwilluseacloudenvironmentconsistingoffiveAmazonEC2instances.Youwillalsousealocalvirtualmachine(VM)toaccessthecloudenvironment.

AlloftheEC2instancesusetheCentOS6.4Linuxdistribution.

Usethetraininguseraccounttodoyourwork.Youshouldnotneedtoenterpasswordsforthetraininguser.

Shouldyouneedsuperuseraccess,youcanusesudoasthetraininguserwithoutenteringapassword.Thetraininguserhasunlimited,passwordlesssudoprivileges.

For the training environment:

• YouwillstarttheVM,andthenuseittoconnecttoyourEC2instances

• TheEC2 instances havebeen configured so that you can connect to themwithoutenteringapassword

• YourinstructorwillprovidethefollowingdetailsforfiveEC2instancesperstudent:

o EC2 public IP addresses – Youwill run a script that adds the EC2public IPaddressesofyour fiveEC2 instances to the/etc/hostsfileonyourVM

o EC2private IP addresses – Youwill use these addresseswhen yourun a script that configures the /etc/hosts file on your EC2instances.EC2private IPaddressesstartwith thenumber10,172,or192.168

Page 4: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

4

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

4

o TheinstructorwillnotprovideEC2internalhostnames.Pleaseusethe host nameselephant,tiger,horse,monkey andlion forthefiveinternalhostnames.Itisimportantthatyouusethesefivehost names, because several scripts, which expect these five hostnames,havebeenprovidedforyourusewhileyouperformexercises.Thescriptswillnotworkifyouusedifferenthostnames.

• PleasewriteoutatablesimilartothefollowingwithyourEC2instanceinformation:

HostName EC2PublicIPAddress EC2PrivateIPAddress

elephant

tiger

horse

monkey

lion

Notational Convention

Insomecommand-linestepsintheexercises,youwillseelineslikethis:

$ hdfs dfs -put shakespeare \

/user/training/shakespeare

Thebackslashattheendofthefirstlinesignifiesthatthecommandisnotcompleted,andcontinuesonthenextline.Youcanenterthecodeexactlyasshown(ontwolines),oryoucanenteritonasingleline.Ifyoudothelatter,youshouldnottypeinthebackslash.

Page 5: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

5

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

5

Copying and Pasting from the Hands-On Exercises Manual

Ifyouwish,youcanusuallycopycommandsandstringsfromthisHands-OnExercisesmanualandpastethemintoyourterminalsessions.However,pleasenoteoneimportantexception:

Ifyoucopyandpastetableentriesthatexceedasingleline,aspacemaybeinsertedattheendofeachline.

Dash(-)charactersareespeciallyproblematicanddonotalwayscopycorrectly.Besuretodeleteanypasteddashcharactersandkeytheminmanuallyafterpastingintextthatcontainsthem.

Pleaseusecautionwhencopyingandpastingwhenperformingtheexercises.

Resetting Your Cluster

Youcanusethereset_cluster.shscripttochangethestateofyourclustersothatyoucanstartwithafresh,correctenvironmentforperforminganyexercise.Usethescriptinsituationssuchasthefollowing:

• Whileattemptingoneoftheexercises,youmisconfigureyourmachinessobadlythatattemptingtodothenextexerciseisnolongerpossible.

• Youhavesuccessfullycompletedanumberofexercises,butthenyoureceiveanemergencycallfromworkandyouhavetomisssometimeinclass.Whenyoureturn,yourealizethatyouhavemissedtwoorthreeexercises.Butyouwanttodothesameexerciseeveryoneelseisdoingsothatyoucanbenefitfromthepost-exercisediscussions.

Thescriptisdestructive:anyworkthatyouhavedoneisoverwrittenwhenthescriptruns.Ifyouwanttosavefilesbeforerunningthescript,youcancopythefilesyouwanttosavetoasubdirectoryunder/home/training.

Beforeyouattempttorunthescript,verifythatnetworkingamongthefivehostsinyourclusterisworking.Ifnetworkinghasnotbeenconfiguredcorrectly,youcan

Page 6: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

6

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

6

reruntheCM_config_hosts.shscripttoresetthenetworkingconfigurationpriortorunningthereset_cluster.shscript.

Runthescriptonelephantonly.Youdonotneedtochangetoadirectorytorunthescript;itisinyourshell’sPATH.

Thescriptstartsbypromptingyoutoenterthenumberofanexercise.Specifytheexerciseyouwanttoperformafterthescripthasrun.Thenconfirmthatyouwanttoresetthecluster(thusoverwritinganyworkyouhavedone).

Thescriptwillfurtherpromptyoutospecifyifyouwanttorunonlythestepsthatsimulatethepreviousexercise,orifyouwanttocompletelyuninstallandreinstalltheclusterandthencatchyouuptothespecifiedexercise.Notethatchoosingtoonlycompletethepreviousexercisedoesnotofferasstronganassuranceofproperlyconfiguringyourclusterasafullresetwoulddo.Itishoweveramoreexpedientoption.

Afteryouhaverespondedtotheinitialprompts,thescriptbeginsbycleaningupyourcluster—terminatingHadoopprocesses,removingHadoopsoftware,deletingHadoop-relatedfiles,andrevertingotherchangesyoumighthavemadetothehostsinyourcluster.Pleasenotethatasthissystemcleanupphaseisrunning,youwillseeerrorssuchas“unrecognizedservice”and“Nopackagesmarkedforremoval.”Theseerrorsareexpected.Theyoccurbecausethescriptattemptstoremoveanythingpertinentthatmightbeonyourcluster.Thenumberoferrormessagesthatappearduringthisphaseofscriptexecutiondependsonthestatetheclusterisinwhenyoustartthescript.

Next,thescriptsimulatesstepsforeachexerciseuptotheoneyouwanttoperform.

Scriptcompletiontimevariesfrom5minutestoalmostanhourdependingonhowmanyexercisestepsneedtobesimulatedbythescript.

Page 7: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

7

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

7

Setup Activity: Configuring Networking Inthispreparatoryexerciseyouwillconfigurenetworkingforyourfiveinstances.

TaskOverview

Inthistask,youwillrunscriptstoconfigurenetworking.

First,youwillstartthelocalGet2EC2VMandrunascripttoconfigurethe/etc/hostsfileonthatVMwiththeaddressesofthefiveEC2instancesandthehostnameselephant,tiger,horse,monkeyandlion.

Next,youwillrunascripttoconfigurethe/etc/hostsfileontheEC2instances,settingthefivehostnamestoelephant,tiger,horse,monkeyandlion.

Thenyouwillverifythatnetworkinghasbeensetupcorrectly.

Finally,youwillrunascriptthatstartsaSOCKS5proxyserveronthelocalVM.

StepstoComplete

1. IfyouareusingVMwareFusiononMacOStoconnecttoyourfiveEC2instances,read‘AppendixA,SettingupVMwareFusiononaMacfortheCloudTrainingEnvironment’andperformanyindicatedactions.Afteryouhavedoneso,continuetothenextstep.

Page 8: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

8

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

8

2. StarttheGet2EC2VM.

YoushouldfindtheVMonyourDesktop,ontheC:driveofyourmachine,orinthelocationtowhichyoudownloadedit.Double-clicktheiconwhosenameendsin.vmxtolaunchtheVM.

Afterafewminutes,theGNOMEinterfacewillappear.

Note: A VirtualBox VM is available for students running Mac OS who are

unable to or prefer not to install VMware Fusion. However, we strongly

recommend using the VMware version VM. Use VirtualBox for this course

only if it is your preferred virtualization environment and if you are

knowledgeable enough to be self-sufficient to troubleshoot problems you

might run into.

If you are using a VirtualBox VM, follow the steps in ‘Appendix B, Setting up

VirtualBox for the Cloud Training Environment.’ When you have completed

the steps in the Appendix, continue to the next step.

3. Runascriptthatmodifiesthe/etc/hostsfileonyourVMbyaddingyourfiveEC2instances.

EnterthefollowingcommandintheterminalwindowintheVM:

$ CM_config_local_hosts_file.sh

TheCM_config_local_hosts_file.shscriptwillpromptyoutoenterthefiveEC2publicIPaddressesforyourfiveEC2instances.RefertothetableyoucreatedwhenyourinstructorgaveyoutheEC2instanceinformation.

4. LogintotheelephantEC2instanceasthetraininguser.

$ connect_to_elephant.sh

Whenpromptedtoconfirmthatyouwanttocontinueconnecting,enteryesandthenpressEnter.

Page 9: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

9

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

9

5. RuntheCM_config_hosts.shscriptonelephant.Thisscriptsetsupthe/etc/hostsfileonelephant,copiesthatfiletotiger,horse,monkeyandlion,andthensetsthehostnamesforthefivehoststoelephant,tiger,horse,monkeyandlion.

Enterthefollowingcommandonelephant:

$ CM_config_hosts.sh

WhenthescriptpromptsyoutoenterIPaddressesforyourEC2instances,entertheEC2privateIPaddresses(whichtypicallystartwith10,172,or 192.168).

Whenthescriptpromptsyoutoconfirmthatyouwanttocontinueconnecting,enteryeseachtimeandthenpressEnter.

6. TerminateandrestarttheSSHsessionwithelephant.

$ exit

$ connect_to_elephant.sh

Note:Youexitandreconnecttoelephanttoresetthevalueofthe$HOSTNAMEvariableintheshellaftertheCM_config_hosts.shscriptchangesthehostname.

7. StartSSHsessionswithyourotherfourEC2instances,logginginasthetraininguser.

Openfourmoreterminalwindows(ortabs)onyourVM,sothatatotaloffiveterminalwindowsareopen.

Inoneofthenewterminalwindows,connecttothetigerEC2instance.

$ connect_to_tiger.sh

Whenthescriptpromptsyoutoconfirmthatyouwanttocontinueconnecting,enteryesandthenpressEnter.(Youwillalsoneedtodothiswhenyouconnecttohorseandmonkey.)

Page 10: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

10

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

10

Inanotherterminalwindow,connecttothehorseEC2instance.

$ connect_to_horse.sh

Inanotherterminalwindow,connecttothemonkeyEC2instance.

$ connect_to_monkey.sh

Intheremainingterminalwindow,connecttothelionEC2instance.

$ connect_to_lion.sh

8. Verifythatyoucancommunicatewithallthehostsinyourclusterfromelephantbyusingthehostnames.

Onelephant:

$ ping elephant

$ ping tiger

$ ping horse

$ ping monkey

$ ping lion

9. VerifythatpasswordlessSSHworksbyrunningtheip addrcommandontiger,horse,monkeyandlionfromasessiononelephant.

Onelephant:

$ ssh training@tiger ip addr

$ ssh training@horse ip addr

$ ssh training@monkey ip addr

$ ssh training@lion ip addr

Theip addrcommandshowsyouIPaddressesandproperties.

Page 11: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

11

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

11

Note: Your environment is set up to allow you to use passwordless SSH to

submit commands from a session on elephant to the root and training

users on tiger, horse, and monkey, and to allow you to use the scp

command to copy files from elephant to the other four hosts. Passwordless

SSH and scp are not required to deploy a Hadoop cluster. We make these

tools available in the classroom environment as a convenience.

10. Runtheuname -ncommandtoverifythatallfivehostnameshavebeenchangedasexpected.

Onallfivehosts:

$ uname -n

11. Verifythatthevalueofthe$HOSTNAMEenvironmentvariablehasbeensettothenewhostname.

Onallfivehosts:

$ echo $HOSTNAME

Thevalueelephant,tiger,horse,monkeyorlionshouldappear.

12. StartaSOCKS5proxyserverontheVM.ThebrowseronyourVMwillusetheproxytocommunicatewithyourEC2instances.

OpenonemoreterminalwindowonthelocalVM–notatabinanexistingwindow–andenterthefollowingcommand:

$ start_SOCKS5_proxy.sh

Whenthescriptpromptsyoutoconfirmthatyouwanttocontinueconnecting,enteryesandthenpressEnter.

13. MinimizetheterminalwindowinwhichyoustartedtheSOCKS5proxyserver.

Page 12: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

12

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

12

Important:DonotclosethisterminalwindoworexittheSSHsessionrunninginitwhileyouareworkingonexercises.Ifyouaccidentallyterminatetheproxyserver,restartitbyopeninganewterminalwindowandrerunningthestart_SOCKS5_proxy.shscript.

At the End of Each Day

At the end of the day, exit all active SSH sessions you have with EC2 instances

(including the session running the SOCKS5 proxy server). To exit an SSH

session, simply execute the exit command.

CAUTION: Do not shut down your EC2 instances.

Then suspend the Get2EC2 VM.

When you return in the morning, resume the Get2EC2 VM, reestablish SSH

sessions with the EC2 instances, and restart the SOCKS5 proxy server.

To reestablish SSH sessions with the EC2 instances, open a terminal window (or

tab), and then execute the appropriate connect_to script.

To restart the SOCKS5 proxy server, open a separate terminal window, and then

execute the start_SOCKS5_proxy.sh script.

This is the end of the setup activity for the training environment.

Page 13: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

13

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

13

Hands-On Exercise: Installing Cloudera Manager Server WithClouderaManager,youcaninstallyourHadoopclusterusingoneoftwoinstallationoptions,referredtoasInstallationPathAandInstallationPathBintheClouderaManagerdocumentation.InstallationPathAisdrivenentirelybyGUI-basedwizardsandisappropriateforademoorproofofconceptinstallation.InstallationPathBiscommand-linedrivenandisappropriateforproductiondeployments.

Forthisinstallation,youwilluseInstallationPathBtoinstallClouderaManagerServeronthelionmachine.BeforeinstallingClouderaManageryouwillconfigureanexternaldatabase(MySQL)tobeusedbyClouderaManagerandCDH,whichyouwillinstallinthenextexercise.

Completeallstepsinhisexerciseonlion.

Verify Existing Environment

1. VerifytheOracleJDKisinstalledandthatJAVA_HOMEisdefinedandreferencedinthesystemPATH.

On lion:

$ java –version

$ echo $JAVA_HOME

$ env | grep PATH

2. VerifyPythonisinstalled.ItisarequirementforHue,whichyouwillinstalllaterinthecourse.

On lion:

$ sudo rpm -qa | grep python-2.6

Page 14: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

14

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

14

3. VerifyOracleMySQLServerisinstalledandrunningonlion.

On lion:

$ chkconfig | grep mysqld

$ sudo service mysqld status

Notethatinatrueproductiondeploymentyouwouldalsomodifythe/etc/my.cnfMySQLconfigurationsandmovetheInnoDBlogfilestoabackuplocation.

4. VerifytheMySQLJDBCConnectorisinstalled.Sqoop(apartofCDHthatyouwillinstallinthiscourse)doesnotshipwithaJDBCconnector,butdoesrequireone.

On lion:

$ ls -l /usr/share/java

Configure the External Database

Inkeepingwiththeapproachtoinstallationthatisappropriateforaproductionclusterenvironment,youwilluseanexternaldatabaseratherthantheembeddedPostgreSQLdatabaseoption.

1. CreatetherequireddatabasesinMySQL.

Onlion,open~/training_materials/admin/scripts/mysql-setup.sqlinatexteditorandobservethescriptdetails.ThisscriptcreatesdatabasesforClouderaManager,theHiveMetastore,theActivityMonitor,andtheReportsManager.

NotethatifyouweregoingtoaddtheSentryServerCDHserviceorinstallClouderaNavigator,additionaldatabaseswouldalsoneedtobecreated.

On lion:

Page 15: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

15

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

15

$ mysql -u root < \

~/training_materials/admin/scripts/mysql-setup.sql

$ mysql -u root

Now,atthemysql>promptonlion,issuethefollowingcommands:

show databases;

exit;

The‘showdatabases’commandshouldshowthatthefourdatabaseswerecreated.

Note:inatrueproductiondeploymentyouwouldalsoregularlybackupyourdatabase.

2. MakeyourMySQLinstallationsecureandsettherootuserpassword.

On lion:

Page 16: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

16

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

16

$ sudo /usr/bin/mysql_secure_installation

[...]

Enter current password for root (enter for none):

OK, successfully used password, moving on...

[...]

Set root password? [Y/n] Y

New password: training

Re-enter new password: training

Remove anonymous users? [Y/n] Y

[...]

Disallow root login remotely? [Y/n] Y

[...]

Remove test database and access to it [Y/n] Y

[...]

Reload privilege tables now? [Y/n] Y

All done!

3. VerifytheClouderaManagerlocalsoftwarerepository.

YourinstancescontainalocalyumrepositoryofClouderaManager5softwaretosavedownloadtimeinthiscourse.

CentOS(andRedHat)storesoftwarerepositoryreferencesin/etc/yum.repos.d.Issuethecommandbelowtoviewthesettings.

On lion:

$ cat /etc/yum.repos.d/cloudera-cm5.repo

Viewthesoftwarepackagesinasubfolderofthereferenceddirectory.

On lion:

Page 17: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

17

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

17

ls ~/software/cloudera-cm5/RPMS/x86_64

NotethatthesetwolocationsexistoneachofthefivemachinesinyourenvironmentandhavealsobeenmadeavailableonHTTPports8050and8000respectivelyviaaLinuxservice.ThissetupisspecifictothecourseenvironmentandisnotrequiredforClouderaManagerorCDHinstallations.

Ifyouwantedtoinstallfromtheonlinerepository,youwouldcreateareferencetotheClouderarepositoryinyour/etc/yum.repos.ddirectory.

Install Cloudera Manager Server

1. InstallClouderaManagerServer.

On lion:

$ sudo yum install -y cloudera-manager-daemons \

cloudera-manager-server

Note:The-yoptionprovidesananswerofyesinresponsetoanexpectedconfirmationprompt.

2. SettheClouderaManagerServerservicetonotstartonboot.

On lion:

$ sudo chkconfig cloudera-scm-server off

3. RunthescripttopreparetheClouderaManagerdatabase.

On lion:

$ sudo /usr/share/cmf/schema/scm_prepare_database.sh \

mysql cmserver cmserveruser password

Afterrunningthecommandaboveyoushouldseethemessage,“Alldone,yourSCMdatabaseisconfiguredcorrectly!”

Page 18: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

18

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

18

4. Verifythelocalsoftwarerepositories.

RunthefollowingtwocommandstoverifytheURLsareworking.

On lion:

$ curl lion:8000

$ curl lion:8050

Eachcommandshouldshowhyperlinks(<ahref=”…”code)tosoftwarerepositories.Ifeithercommanddidnotsuccessfullycontactthewebserver,discusswiththeinstructorbeforecontinuing.

5. StarttheClouderaManagerServer.

Onlion:

$ sudo service cloudera-scm-server start

6. ReviewtheprocessesstartedbytheClouderaManagerServer.

Onlion:

$ top

ClouderaManagerServerrunsasajavaprocessthatyoucanviewbyusingthetopLinuxutility.NoticetheCPUusageisinthe90thpercentileoraboveforafewsecondswhiletheserverstarts.OncetheCPUusagedrops,theClouderaManagerbrowserinterfacewillbecomeavailable.

Onlion:

$ ps wax | grep [c]loudera-scm-server

TheresultsofthepscommandaboveshowthatClouderaManagerServerisusingtheJDBCMySQLconnectortoconnecttoMySQL.Italsoshowsloggingconfigurationandotherdetails.

7. ReviewtheClouderaManagerServerlogfile.

Page 19: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

19

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

19

Thepathtothelogfileis/var/log/cloudera-scm-server/cloudera-scm-server.log.NotethatyoumustusesudotoaccessClouderaManagerlogsbecauseofrestrictedpermissionsontheClouderaManagerlogfiledirectories.

On lion:

$ sudo less /var/log/cloudera-scm-server/cloudera-scm-\

server.log

Note:YouwilllogintoClouderaManagerinthenextexercise.

This is the end of the exercise.

Page 20: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

20

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

20

Hands-on Exercise: Creating a Hadoop Cluster Inthistask,youwilllogintotheClouderaManagerAdminConsoleandusetheClouderaManagerserviceswizardtocreateaHadoopcluster.ThewizardwillpromptyoutoidentifythemachinestobemanagedbyClouderaManager.ItwilltheninstalltheClouderaManagerAgentoneachmachine.Atthatpoint,yourenvironmentwillbereadyfortheCDHsoftwaretobeinstalled.

YouwillthenbepromptedtochoosewhichCDHservicesyouwanttoaddintheclusterandtowhichmachinesyouwouldliketoaddeachservice.

Attheendofthisexercise,youwillhaveHadoopdaemonsdeployedacrossyourclusterasdepictedhere(daemonsaddedinthisexerciseshowninblue).

Page 21: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

21

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

21

InsubsequentExercises,youwilladdmoreHadoopservicestoyourcluster.

Becauseyouhaveonlyfivehoststoworkwith,youwillhavetorunmultipleHadoopdaemonsonallthehostsexceptlion,whichwillbeusedforClouderaManagerservicesonly.Forexample,theNameNode,aDataNodeandaNodeManagerwillallrunonelephant.HavingaverylimitednumberofhostsisnotunusualwhendeployingHadoopforaproof-of-conceptproject.Pleasefollowthebestpracticesinthe“PlanningYourHadoopCluster”chapterwhensizinganddeployingyourproductionclusters.

Aftercompletingtheinstallationsteps,youwillreviewaClouderaManagerAgentlogfileandreviewprocessesrunningonamachineinthecluster.Finally,thereisasectionattheendofthisexercisethatprovidesabrieftouroftheClouderaManagerAdminUI.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandonelephantandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Login to Cloudera Manager Admin UI

1. VerifythatyouarerunningaSOCKS5proxyserverstartedwiththestart_SOCKS5_proxy.shscript.TheproxyservershouldberunninginaseparateterminalwindowontheGet2EC2VM.

2. StarttheClouderaManagerAdminConsolefromabrowser.TheURLishttp://lion:7180.

IfanUnabletoConnectmessageappears,theClouderaManagerserverhasnotyetfullystarted.Waitseveralmoments,andthenattempttoconnectagain.

3. Loginwiththeusernameadmin.Thepasswordisadmin.

Page 22: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

22

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

22

The“WelcometoClouderaManager”pageappearswithatablethatprovidesdetailsabouteditionsofClouderaManagersoftware.

Install Cloudera Manager Agents

1. Agreenboxhighlightingaproducteditioncolumnshouldappear.

SelecttheClouderaEnterpriseDataHubEditionTrial.

ClickContinue.

2. The“ThankyouforchoosingClouderaManagerandCDH”pageappears.

ClickContinue.

3. The“SpecifyhostsforyourCDHclusterinstallation”pageappears.

Typeinthenamesofallfivemachines:elephant tiger horse monkey lion

ClickSearch.Allfivemachinesshouldbefound.Ensuretheyareallselected.

Page 23: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

23

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

23

ClickContinue.

4. The“ClusterInstallation–SelectRepository”pageappears.

Specifythefollowingoption:

• ChooseMethod-UseParcels

Clickthe“MoreOptions”button.

• Inthe“RemoteParcelRepositoryURLs”area,removeallthecurrentlinesbyusingtheminus(-)signbutton.

• Oncetheexistingentriesareallremoved,clickontheplus(+)signtoaddanewURL.

• Intheblankfieldthatappears,typein http://lion:8000

• ClickOKtoreturntotheSelectRepositorypage.

The‘SelectRepository’pageshouldnowshowthatCDH-5.x.x-x.cdh5.x.x.px.xx(whereeachxisaversionnumber)isavailable.SelectthisversionofCDH.

Inthe“SelectthespecificreleaseoftheClouderaManagerAgentyouwanttoinstallonyourhosts”areachoose“CustomRepository”.

• Intheblankfieldthatappears,typein http://lion:8050

ClickContinue.

5. The“ClusterInstallation–JDKInstallationOptions”pageappears.

AsupportedversionoftheOracleJDKisalreadyinstalled.

VerifytheboxisuncheckedsothattheinstallationofJDKwillbeskipped.

Page 24: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

24

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

24

ClickContinue.

6. The“ClusterInstallation–EnableSingleUserMode”pageappears.

Keepthedefaultsetting(SingleUserModenotenabled)andclickContinue.

7. The“ClusterInstallation–ProvideSSHlogincredentials”pageappears.

Keep“LoginToAllHostsAs”settoroot.

For“AuthenticationMethod:choose“Allhostsacceptsameprivatekey”

ClicktheBrowsebutton.Inthe“Places”columnchoose“training”.Then,inthe“Name”area,right-click(oronaMacCtrl-click),andselect“ShowHiddenFiles.”Finally,clickintothe.sshdirectory,selecttheid_rsafileandclickOpen.

LeavethepassphrasefieldsblankfieldsandclickContinue.

WhenpromptedtocontinuewithnopassphraseclickOK.

8. The”ClusterInstallation–Installationinprogress”pageappears.

ClouderaManagerinstallstheClouderaManagerAgentoneachmachine.

Oncetheinstallationsuccessfullycompletesonallmachines,clickContinue.

Install Hadoop Cluster

1. The”ClusterInstallation–InstallingSelectedParcels”pageappears.

Page 25: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

25

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

25

TheCDHparcelisdownloaded,distributed,andactivatedonallhostsinthecluster.

The“distributing”actionmaytakeafewminutessincethisinvolvescopyingthelargeCDHparcelfilefromthelionmachinetoalloftheothermachinesinthecluster.

Whenitcompletes,clickContinue.

2. The“ClusterInstallation–Inspecthostsforcorrectness”pageappears.

Afteramoment,outputfromtheHostInspectorappears.

ClickFinish.

3. The“ClusterSetup-ChoosetheCDH5servicesthatyouwanttoinstallonyourcluster”pageappears.

Click“CustomServices”.

AtableappearswithalistofCDHservicetypes.

4. SelecttheHDFSandYARN(MR2Included)servicetypes.

Youwilladdadditionalservicesinlaterexercises.

ClickContinue.

Page 26: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

26

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

26

5. TheClusterSetup–CustomizeRoleAssignmentspageappears.

Assignthefollowingroles.

Role Node(s)

HDFS

NameNode elephant

SecondaryNameNode tiger

Balancer horse

HttpFS Donotspecifyanyhosts

NFSGateway Donotspecifyanyhosts

DataNode elephant,tiger,horse,monkey,butnotlion;theorderinwhichthehostsarespecifiedisnotsignificant

ClouderaManagementService

ServiceMonitor lion

ActivityMonitor lion

HostMonitor lion

ReportsManager lion

EventServer lion

AlertPublisher lion

YARN(MR2Included)

ResourceManager horse

JobHistoryServer monkey

NodeManagerSameasDataNode(elephant,tiger,horse,monkey,butnotlion)

Toassignarole,clickthefieldswithoneormorehostnamesinthem.Forexample,thefieldunderNameNodeinitiallyhasthevaluelion.TochangetheroleassignmentfortheNameNodetoelephant,clickthefieldunderNameNode(thathaslioninit).Alistofhostsappears.Selectelephant,andthenclickOK.ThefieldunderNameNodenowhasthevalueelephantinit.

Page 27: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

27

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

27

Whenyouhavefinishedassigningroles,compareyourroleassignmentstotheroleassignmentsinthescreenshotonthenextpage.

Verifythatyourroleassignmentsarecorrect.

Whenyouarecertainthatyouhavethecorrectroleassignments,clickContinueandproceedtothestepafterthescreenshot.

6. The“ClusterSetup–DatabaseSetup”pageappears.

Fillinthedetailsasshowhere.

DatabaseHostName

DatabaseType

DatabaseName

Username Password

lion MySQL amon amonuser password lion MySQL rman rmanuser password

Click“TestConnection”toverifythatClouderaManagercanconnecttotheMySQLdatabasesyoucreatedinanearlierExerciseinthiscourse.

Afteryouhaveverifiedthatallconnectionsaresuccessful,clickContinue.

Page 28: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

28

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

28

7. TheClusterSetup–ReviewChangespageappears.

Specifythefollowingvalues(changelibtolog):

• HostMonitorStorageDirectory–/var/log/cloudera-host-monitor

• ServiceMonitorStorageDirectory–/var/log/cloudera-service-monitor

ClickContinue.

8. The“ClusterSetup–Progress”pageappears.

Progressmessagesappearwhileclusterservicesarebeingcreatedandstarting.

ThefollowingisasummaryofwhatClouderaManager’sClusterSetupwizardliststhatitdoesforyou:

• FormatstheHDFSNameNode• StartstheHDFSservice• Createsa/tmpdirectoryinHDFS• CreatesaMR2JobHistorydirectoryinHDFS• CreatesaYARNNodeManagerremoteapplicationlogdirectoryinHDFS• StartstheYARNservice• StartstheClouderaManagerServiceservice• Deploysallclientconfigurations

Whenalltheclusterserviceshavestarted,clickContinue.

Page 29: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

29

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

29

9. The“ClusterSetup–Congratulations!”pageappears.

Thepageindicatesthatserviceshavebeenaddedandarenowconfiguredandrunning.ClickFinish.

TheClouderaManagerHomepageappears.Clusterinstallationisnowcomplete.

10. Anoteregardingtheconfigurationwarnings.

Theconfigurationwarnings-highlightedinredinthescreenshotbelow—areexpected,andindicatethatalthoughtheservicesareingoodhealth,theyareoperatinginlowmemoryconditions.Inaproductiondeploymentyouwouldwanttoensurethesewarningwereaddressed.However,inourtrainingenvironmentthesewarningscanbesafelyignored.

Examine Running Processes on a Cluster Node

1. Reviewoperatingsystemservicesonahostinthecluster.

Onelephant:

$ chkconfig | grep cloudera

$ sudo service cloudera-scm-agent status

YouhavealreadyaddedYARN(IncludingMR2)andHDFSservices,yettheonlyserviceregisteredwithinitscriptsattheoperatingsystemlevelisthecloudera-scm-agentservice.

Page 30: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

30

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

30

WithClouderaManagermanagedclusters,thecloudera-manager-agent serviceoneachclusternodemanagesstartingandstoppingthedeployedHadoopdaemons.

2. ReviewtherunningJavaprocessesonahostinthecluster.

on elephant:

$ sudo jps

TheHadoopdaemonsrunasJavaprocesses.YoushouldseetheNodeManager,NameNode,andDataNodeprocessesrunning.

ExaminethedetailsofoneoftherunningCDHdaemons.

$ sudo ps -ef | grep NAMENODE

Youwillseedetailsabouttheprocess.

IfyougrepforDATANODEorNODEMANAGERyouwillseedetailsonthoseprocesses.Grepfor‘cloudera’toseeallCDHandClouderaManagerprocessescurrentlyrunningonthemachine.

Testing Your Hadoop Installation

WewillnowtesttheHadoopinstallationbyuploadingsomedata.Thehdfs dfscommandyouuseinthisstepwillbeexploredingreaterdetailinthenextexercise.

1. Uploadsomedatatothecluster.

ThecommandsbelowhaveyouunzipafileonthelocaldriveandthenuploadittoHDFS’/tmpdirectory.

Onelephant:

Page 31: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

31

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

31

$ cd ~/training_materials/admin/data

$ gunzip shakespeare.txt.gz

$ hdfs dfs -put shakespeare.txt /tmp

2. VerifythatthefileisnowinHDFS.

InClouderaManagerchooseClusters>HDFS.ThenclickonFileBrowser.

Browseintotmp andconfirmthatshakespeare.txtappears.

A Brief Tour of Cloudera Manager

NowthatyouhavetheClouderaManagermanagedclusterrunning,let’sbrieflyexploreafewareasintheAdminconsole.

OntheHomepage,youwillseetheclusternamedCluster1thatyoucreated.

1. ClickonthedropdownmenutotherightofCluster1toseemanyoftheactionsyoucanperformonthecluster,suchasAddaService,Stop,andRestart.

Page 32: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

32

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

32

2. ClickHoststoviewthecurrentstatusofeachmanagedhostinthecluster.Clickingonthe>iconintheRolescolumnforahostwillrevealwhichrolesaredeployedonthathost.

InthisexerciseandtheexercisesthatfollowyouwilldiscovermanyotherareasofClouderaManagerthatwillproveusefulforadministeringyourHadoopcluster(s).

This is the end of the exercise.

Page 33: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

33

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

33

Hands-On Exercise: Working With HDFS InthisHands-OnExerciseyouwillcopyalargeWebserverlogfileintoHDFSandexploretheresultsintheHadoopNameNodeWebUIandtheLinuxfilesystem.YouwillthencreateasnapshotofadirectoryinHDFS,deletesomedataandthenrestorebacktoitsoriginallocationinHDFS.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

PerformallCommandLinestepsinthisexerciseonelephant.

Confirm HDFS Processes and Settings

1. ConfirmthatallHDFSprocessesarerunning.

FromtheClouderaManagerHomepageclickonHDFSandthenclickontheInstancestab.

NoticethatthethreedaemonsthatsupportHDFS–theNameNode,SecondaryNameNode,andDataNodedaemons–arerunning.InfactthereareDataNodesrunningonfourhosts.

2. DeterminethecurrentHDFSreplicationfactor.

FromtheClouderaManagerHomepageclickintotheSearchboxandtype“replication”.

Choose“HDFS:ReplicationFactor”whichshouldbeoneofthesearchresults.

YouaretakentotheHDFSConfigurationpagewhereyouwillfindthedfs.replicationsettingthathasthedefaultvalueof3.

Page 34: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

34

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

34

3. Similarly,usethesearchboxintheHDFSconfigurationpage,andsearchfor“blocksize”todiscovertheHDFSBlockSizesettingwhichdefaultsto128MiB.

Add Directories and Files to HDFS

1. Asthehdfsuser,createahomedirectoryforthetraininguseronHDFSandgivethetraininguserownershipofit.

Onelephant:

$ sudo -u hdfs hdfs dfs -mkdir -p /user/training/

$ sudo -u hdfs hdfs dfs -chown training /user/training

2. CreateanHDFSdirectorycalled/user/training/weblog,inwhichyouwillstoretheaccesslog.

Onelephant:

$ hdfs dfs -mkdir weblog

3. Extracttheaccess_log.gzfileanduploadittoHDFSinasinglestep.

Onelephant:

$ cd ~/training_materials/admin/data

$ gunzip -c access_log.gz \

| hdfs dfs -put - weblog/access_log

TheputcommanduploadedthefiletoHDFS.Thedashaftertheputcommandreadstheinputfromstdinandwritesittothedestinationdirectory.

Sincethefilesizeis504MB,HDFSwillhavesplititintomultipleblocks.Let’sexplorehowthefilewasstored.

4. Runthehdfs dfs -lscommandtoreviewthefile’spermissionsinHDFS.

Onelephant:

Page 35: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

35

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

35

$ hdfs dfs -ls /user/training/weblog

Analyze File Storage in HDFS and On Disk

1. Locateinformationabouttheaccess_logfileintheNameNodeWebUI.

a. FromtheClouderaManagerClustersmenu,chooseHDFSforyourcluster.

b. IntheHDFSStatuspage,clickon“NameNodeWebUI(Active)”.ThiswillopentheNameNodeWebUIathttp://elephant:50070.

c. ChooseUtilities>“Browsethefilesystem,”thennavigateintothe/user/training/weblogdirectory.

d. Noticethatthepermissionsshownfortheaccess_logfileareidenticaltothepermissionsyouobservedwhenyouranthehdfs dfs -lscommand.

e. Nowselecttheaccess_logfileintheNameNodeWebUIfilebrowsertobringuptheFileinformationwindow.

NoticethattheBlockInformationdropdownlisthasfourentries,Block0,Block1,Block2,andBlock3.Thismakessensebecauseasyoudiscovered,theHDFSBlockSizeonyourclusteris128MB.Theextractedaccess_logfileis559MB.

Blocks0,1,and2allshowasizeof134217728(or128MB)inaccordancewiththesizespecifiedintheHDFSblocksizesettingyouobservedearlierinthisexercise.Block3issmallerthantheothersasyoucaninthe“Size”detailsifyouchooseit.

Noticealsothateachblockisavailableonthreedifferenthostsinthecluster.Thisiswhatyouwouldexpectsincethreeisthecurrent(anddefault)replicationfactorinHDFS.Alsonoticethateachblockmaybereplicatedtodifferentdatanodesthantheotherblocksthatmakeupthefile.

Page 36: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

36

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

36

a. ChooseBlock0andwritedownthevaluethatappearsintheSizefield.IfBlock0isnotreplicatedonelephantthenchoseanotherblockthatisreplicatedonelephant.

________________________________________________________________________________

YouwillneedthisvaluewhenyouexaminetheLinuxfilethatcontainsthisblock.

f. SelectthevalueoftheBlockIDfieldandcopyit(Editmenu>Copy).Youwillneedthisvalueforthenextstepinthisexercise.

2. LocatetheHDFSblockontheelephantLinuxfilesystem.

a. InClouderaManager’sHDFSConfigurationpage,conductasearchfor“DataDirectory”.YouwillseethattheDataNodeDataDirectoryis/dfs/dn.

b. Let’sfindtheblocksstoredinthatdirectory.Onelephant:

$ sudo find /dfs/dn -name '*BLKID*' -ls

whereBLKIDistheactualBlockIDyoucopiedfromtheNameNodeWebUI.

c. VerifythattwofileswiththeBlockIDyoucopiedappearinthefindcommandoutput–onefilewithanextension,.meta,andanotherfilewithoutthisextension.

d. VerifyintheresultsofthefindcommandoutputthatthesizeofthefilecontainingtheHDFSblockisexactlythesizethatwasreportedintheNameNodeWebUI.

3. StartinganyLinuxeditorwithsudo,openthefilecontainingtheHDFSblock.Verifythatthefirstfewlinesofthefilematchthefirstchunkoftheaccess_logfilecontent.

Page 37: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

37

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

37

Note:YoumuststartyoureditorwithsudobecauseyouareloggedintoLinuxasthetraininguser,andthisuserdoesnothaveprivilegestoaccesstheLinuxfilethatcontainstheHDFSblock.

$ sudo head /dfs/dn/path/to/block

Note:Replace/path/to/blockinthecommandabovewiththeactualpathtotheblockasshownintheresultsofthefindcommandyouraninthepreviousstep.

Youcanreviewtheaccess_logfilecontentonHDFSasfollows:

$ hdfs dfs -cat weblog/access_log | head -

Theresultsreturnedbythelasttwocommandsshouldmatchexactly.

This is the end of the exercise.

Page 38: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

38

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

38

Hands-On Exercise: Running YARN Applications InthisexerciseyouwillruntwoYARNapplicationsonyourcluster.ThefirstapplicationisaMapReducejob.ThesecondisanApacheSparkapplication.YouwilladdtheSparkserviceonyourclusterbeforerunningtheSparkapplication.Aftercompletingthisexercise,yourclusterwillhavethefollowingcomponentsinstalled(itemsinstalledinthisexercisehighlightedinblue):

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Page 39: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

39

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

39

Performallstepsinthisexerciseonelephant.

Submitting a MapReduce Application to Your Cluster

WewillnowtesttheHadoopinstallationbyrunningasampleHadoopapplicationthatshipswiththeHadoopsourcecode.ThisisWordCount,aclassicMapReduceprogram.We’llruntheWordCountprogramagainsttheShakespearedataaddedtoHDFSinapreviousexercise.

1. SincethecodefortheapplicationwewanttoexecuteisinaJavaArchive(JAR)file,we’llusethehadoop jarcommandtosubmitittothecluster.LikemanyMapReduceprograms,WordCountacceptstwoadditionalarguments:theHDFSdirectorypathcontaininginputandtheHDFSdirectorypathintowhichoutputshouldbeplaced.Therefore,wecanruntheWordCountprogramwiththefollowingcommand.

Onelephant:

$ hadoop jar /opt/cloudera/parcels/CDH-5.3.2-1.cdh\

5.3.2.p0.10/lib/hadoop-mapreduce/hadoop-mapreduce-\

examples.jar wordcount /tmp/shakespeare.txt counts

Verifying MapReduce Job Output

2. Oncetheprogramhascompletedyoucaninspecttheoutputbylistingthecontentsoftheoutput(counts)directory.

Onelephant:

$ hdfs dfs -ls counts

3. Thisdirectoryshouldshowallthedataoutputforthejob.Joboutputwillincludea_SUCCESSflagandonefilecreatedbyeachReducerthatran.Youcanviewtheoutputbyusingthehdfs dfs -catcommand.

Onelephant:

Page 40: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

40

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

40

$ hdfs dfs -cat counts/part-r-00000

Review the MapReduce Application Details and Logs

InthistaskyouwillstartbylookingatdetailsintheYARNApplicationsareaofClouderaManager.TheApplicationDetailslinkinClouderaManagerwillthentakeyoutotheHistoryServerWebUIathttp://monkey:19888.

Asyougothroughthestepsbelow,seeifyoucanreconstructwhatoccurredwherewhenyourantheMapReducejob,bycreatingachartliketheonebelow.

Node(s)

ApplicationMaster

MapTask(s)

ReduceTask(s)

1. LocateyourMapReduceapplicationinClouderaManager.

InClouderaManager,chooseClusters>YARN(MR2Included),thenclickonApplications.

IntheResultstabthatdisplays,youshouldseetheMapReduceapplicationthatjustran.

Note:TherewillbeanentryforeachcompletedMapReducejobthatyouhaverunonyourclusterwithinthetimeframeofyoursearch.Thedefaultsearchisforapplicationsrunwithinthelast30minutes.

2. AccesstheHistoryServerWebUItodiscoverwheretheApplicationMasterran.

Fromthedropdownmenuforthe“wordcount”application,choose“ApplicationDetails.”

Page 41: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

41

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

41

ThisactionwillopenapageintheHistoryServerWebUIwithdetailsaboutthejob.

3. LocatewheretheApplicationMasterranandviewthelog.

NoticetheApplicationMastersectionshowswhichclusternoderantheMapReduceApplicationMaster.Clickthe“logs”linktoviewtheApplicationMasterlog.

Noticealsothenumberofmapandreducetasksthatraninordertocompletethewordcountjob.Thenumberofreducersrunbythejobshouldcorrespondtothenumberofpart-r-#####filesyousawwhenyouranthehdfs dfs -lscommandearlier.Therearenopart-m-#####filesbecausethejobranatleastonereducer.

4. Locatewherethemappertaskranandviewthelog.

FromtheHistoryServerWebUI’s“Job”menuchoose“Maptasks”.

FromtheMapTaskstable,clickonthelinkintheNamecolumnforthetask.

TheAttemptstabledisplays.Noticethe“Node”columnshowsyouwherethemaptaskattemptran.

Clickthe“logs”linkandreviewthecontentsofthemappertasklog.Whendone,clickthebrowserbackbuttontoreturntothepreviouspage.

5. Locatewherethereducetasksranandviewthelogs.

FromtheHistoryServerWebUI’sJobmenuchoose“Reducetasks.”

Page 42: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

42

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

42

FromtheReduceTaskstable,clickonthelinkintheNamecolumnforoneofthetasks.

TheAttemptstabledisplays.Noticethe“Node”columnshowsyouwherethisreducertaskran.

Clickthe“logs”linkandreviewthecontentsofthelog.ObservetheamountofoutputintheReducertasklog.Whendone,clickthebrowserbackbuttontoreturntothepreviouspage.

6. DeterminetheloglevelforReducertasksfortheword countjob.

ExpandtheJobmenuandchoose“Configuration.”

Twentyentriesfromthejobconfigurationthatwereineffectwhentheword countjobranappear.

IntheSearchfield,enterlog.level.

Locatethemapreduce.reduce.log.levelproperty.ItsvalueshouldbeINFO.

Note:INFOisdefaultvalueforthe“JobHistoryServerLoggingThreshold”whichcanbefoundintheClouderaManagerYARNConfigurationpageforyourcluster.

Run the MapReduce Application with a Custom Log Level Setting

1. RemovethecountsdirectoryfromHDFSandreruntheWordCountprogram,thistimepassingitaloglevelargument.

Onelephant:

Page 43: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

43

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

43

$ hdfs dfs -rm -r counts

$ hadoop jar /opt/cloudera/parcels/CDH-5.3.2-1.cdh\

5.3.2.p0.10/lib/hadoop-mapreduce/hadoop-mapreduce-\

examples.jar wordcount \

-D mapreduce.reduce.log.level=DEBUG \

/tmp/shakespeare.txt counts

Note:YoumustdeletethecountsdirectorybeforerunningtheWordCountprogramasecondtimebecauseMapReducewillnotrunifyouspecifyanoutputpathwhichalreadyexists.

MapReduceprogramscodedtotakeadvantageoftheHadoopToolRunnerallowyoutopassseveraltypesofargumentstoHadoop,includingrun-timeconfigurationparameters.Thehadoop jarcommandshownabovesetstheloglevelforreducetaskstoDEBUG.

Note:The-Doption,asusedinthehadoop jarcommandabove,allowsyouoverrideadefaultpropertysettingbyspecifyingthepropertyandthevalueyouwanttoassign.

Whenyourjobisrunning,lookforalineinstandardoutputsimilartothefollowing:

14/12/09 05:47:16 INFO mapreduce.Job: Running job:

job_1391249780844_0004

2. Afterthejobcompletes,locateandviewoneofthereducerlogs.

FromClouderaManager’sYARNApplicationspageforyourcluster,locatetheentryfortheapplicationthatyoujustran.

ClickontheIDlink,andusetheinformationavailableundertheJobmenu’s“Configuration”and“Reducetasks”linkstoverifythefollowing:• Thevalueofthemapreduce.reduce.log.levelconfiguration

attributeisDEBUG.

Page 44: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

44

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

44

• TheReducertask’slogsforthisjobcontainDEBUGlogrecordsandthe

logsarelargerthanthenumberofrecordswrittentotheReducertask’slogsduringthepreviousWordCountjobexecution.

3. VerifytheresultsofthewordcountjobwerewrittentoHDFSusinganyofthefollowingthreemethods.

Option1:InClouderaManager,browsetotheHDFSpageforyourcluster,thenchooseFileBrowser.Drilldowninto/user/training/counts.

Option2:AccesstheHDFSNameNodeWebUIathttp://elephant:50070.ChooseUtilities>“Browsethefilesystem”,andnavigatetothe/user/training/counts directory.

Option3:onanymachineintheclusterthathastheDataNodeinstalled(elephant,tiger,monkey,orhorse)runthefollowingcommandinaterminal:

$ hdfs dfs -tail counts/part-r-00000

Add the Apache Spark Service

Inthistask,youwilladdtheSparkservicetoyourclusterusingClouderaManager.YouwillthenrunaSparkapplication.

1. InClouderaManager,navigatetotheHomepage.

2. Selectthe“AddaService”optionfromthedrop-downmenuforCluster1.

Page 45: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

45

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

45

TheAddServiceWizardappears.

3. SelectSparkandclickContinue.

The“Selectthesetofdependenciesforyournewservice”pageappears.

4. Selecttherowcontainingthehdfsandyarnservices,thenclickContinue.

TheCustomizeRoleAssignmentspageappears.

5. Specifyhostassignmentsasfollows:

• HistoryServer–monkeyonly• Gateway–monkeyonly

ClickContinue.

6. ProgressmessagesappearontheProgresspage.

Whentheaddingoftheservicehascompleted,clickContinue.

7. TheCongratulationspageappears.

ClickFinish.

8. TheHomepageappears.

AstatusindicatorshowsyouthattheSparkserviceisingoodhealth.

Note:youmaynoticeaconfiguringwarningthatappearsnextto“Hosts”ontheClouderaManagerhomepage.Ifyoulookintoit,ClouderaManagerindicatesthatmemoryisovercommittedonhostmonkey.Thisconfigurationissue,alongwiththefiveotherconfigurationwarningsthatappearedaftertheinitialclusterinstallation,wouldneedtobeaddressedinatrueproductioncluster,howevertheycanbesafelyignoredintheclassroomenvironment.

Run Spark as a YARN Application

Youshouldcompletethesestepsonmonkey sincemonkeyiswhereyoujustaddedtheSparkgatewayrole.

Page 46: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

46

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

46

1. StarttheSparkshellandconnecttotheyarn-clientsparkcontextonmonkey.

RecallthattheSparkGatewayservicewasinstalledonmonkeysotheSparkshellshouldberunfrommonkey.

Onmonkey:

$ spark-shell --master yarn-client

TheScalaSparkshellwilllaunch.Youshouldeventuallyseethemessage,“Sparkcontextavailableassc.”Ifnecessary,clickthe<Enter>keyonyourkeyboardtoseethescala>prompt.

2. Typeinthecommandsbelowtorunawordcountapplicationusingtheshakespeare.txtfilethatisalreadyinHDFS.

ThisaccomplishessomethingverysimilartothejobyouranintheMapReduceexercise,butthistimethecomputationalframeworkbeingusedisSpark.

scala> val file = sc.textFile(

"hdfs://elephant:8020/tmp/shakespeare.txt")

scala> val counts = file.flatMap(line => line.split(

" ")).map(word => (word, 1)).reduceByKey(

_ + _).sortByKey()

scala> counts.saveAsTextFile(

"hdfs://elephant:8020/tmp/sparkcount")

scala> sc.stop()

scala> sys.exit()

3. ViewtheapplicationresultswrittentoHDFSbySpark.

Onelephant:

Page 47: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

47

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

47

$ hdfs dfs -cat /tmp/sparkcount/part-00000 | less

Review Application Details in the Spark History Server

ViewtheSparkapplicationdetailsinClouderaManagerandtheSparkHistoryServerWebUI.

1. InClouderaManager,gototheYARNApplicationspageforyourcluster.

Youwillseea“Sparkshell”application.

2. ClickthelinkintheIDfield.

ApageintheYARNResourceManagerWebUIopenswithdetailsabouttheapplication.

3. ClickontheHistorylink.

ASparkJobspageintheSparkHistoryServerWebUIopens.

NoticethatthisSparkApplicationconsistedoftwojobsthatarenowcompleted.

4. IntheCompletedJobsarea,clickonthe“sortByKey…”linkforthefirstjobthatran(JobId0).

Noticethatthisfirstjobconsistedoftwostages.

5. Clickonthe“mapat…“linkforthefirststage(Stage0).

The“DetailsforStage0”pageappears.

Hereyoucanseethatthereweretwotasksinthisstageandyoucanseeonwhichexecutorandonwhichhosteachtaskran.Youcanalsoseetasksdetailssuchasduration,inputdatasize,theamountofdatawrittentodiskduringshuffleoperations.

6. ClickontheExecutorstabtoseeasummaryofalltheexecutorsusedbytheSparkapplication.

Page 48: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

48

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

48

Review the Spark Application Logs

1. AccesstheSparkapplicationlogsfromthecommandline.

FirstlocatetheapplicationID.

Onelephant:

$ yarn application -list -appStates FINISHED

CopytheapplicationIDfortheSparkapplicationreturnedbythecommandabove.

Nowrunthiscommand(whereappIdistheactualapplicationID).

Onelephant:

$ yarn logs -applicationId appId | less

Scrollthroughthelogsreturnedbytheshell.NoticethatthelogsforallthecontainersthatrantheSparkexecutorsareincludedintheresults.

TheseSparkapplicationlogsarestoredinHDFSunder/user/spark/\ applicationHistory.

This is the end of the exercise.

Page 49: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

49

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

49

Hands-On Exercise: Explore Hadoop Configurations and Daemon Logs IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Exploring Hadoop Configuration Settings

Inthistask,youwillexploresomeoftheHadoopconfigurationfilesauto-generatedbyClouderaManager.

1. GotothedirectorythatcontainstheHadoopconfigurationforClouderaManager-manageddaemonsrunningonelephantandthenviewthecontents.

Onelephant:

$ cd /var/run/cloudera-scm-agent/process

$ sudo tree

Noticehowthereareseparatedirectoriesforeachroleinstancerunningonelephant-DataNode,NameNode,andNodeManager.Noticealsothatsomefiles,suchashdfs-site.xmlandcore-site.xml,existinmorethanonedaemon’sconfigurationdirectory.

2. Comparethefirst20linesoftheNameNode’scopyofhdfs-site.xmltotheNodeManager’scopyofthesamefile.

Onelephant:

Page 50: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

50

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

50

$ sudo head -20 nn-hdfs-NAMENODE/core-site.xml

$ sudo head -20 nn-yarn-NODEMANAGER/core-site.xml

Inthecommandsabove,replaceeachnnwiththeactualnumbersgeneratedbyClouderaManager.

Theentriesinthesecore-site.xmlfilesreflectthesettingsconfiguredinClouderaManagerfortheseparticularroleinstances.SomeofthesettingsreflectchoicesyoumadewhenyourantheClouderaManagerinstallationwizard.OtherentriesareoptimalinitialordefaultvalueschosenbyClouderaManager.

3. MakeaconfigurationchangeinClouderaManager.

InClouderaManager,browsetotheHDFSpageforyourclusterandthenchooseConfiguration.

Conductasearchfortheword“trash”.TheNameNodeDefaultGroup“FilesystemTrashInterval”settingappears.

Double-clickintotheValueareawhereitcurrentlyreads“Iday(s)”.

Changethisvalueto2day(s).ClickSaveChanges.

Noticethe “StaleConfiguration-Restartneeded”iconthatappearsonthescreen.Clickonit.

The“StaleConfigurations-ReviewChanges”screenappears,showingthechangestocore-site.xmlthatwillbemade.

Click“RestartCluster”.The“Cluster1StaleConfigurations-RestartCluster”screenappears.

Click“RestartNow”.The“StaleConfigurations-Progress”screenappears.Waitforthechildcommandstocompletesuccessfully.ClickFinish.

4. Returntotheelephantterminalwindowandlistthecontentsofthe/var/run/cloudera-scm-agent/processdirectory.

Page 51: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

51

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

51

$ sudo ls -l /var/run/cloudera-scm-agent/process

NoticenowhowtherearenowtwodirectorieseachforNameNode,DataNode,andNodeManager.Theoldsettingshavebeenretained,howeverthenewsettingshavealsobeendeployedandwillnowbeused.Thedirectorywiththehighernumberinthenameisthenewerone.

5. FindthedifferencebetweentheoldNameNodecore-site.xmlfileandthenewone.

Onelephant:

$ sudo diff -C 2 nn-hdfs-NAMENODE/core-site.xml nn-hd\

fs-NAMENODE/core-site.xml

Thennvaluesaboveshouldbereplacedwiththeactualnumberswithwhichtheconfigurationdirectoriesarenamed.

Youshouldseethatthefs.trash.intervalpropertyvaluechangehasbeendeployedtothenewNameNodeconfigurationfile.

6. Reverttheconfigurationchangeandrestartthecluster.

InClouderaManager,gototheHDFSConfigurationpageforyourcluster.

Clickthe“HistoryandRollback”button.

The“ConfigurationandRoleGroupHistory”pagedisplays.

Page 52: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

52

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

52

UnderCurrentRevisionclick“Details”.

TheRevisionDetailsscreendisplays.NoticetheFilesystemTrashIntervalpropertyvaluethatyoujustmodifiedislisted.Choose“RevertConfigurationChanges”.

Amessageappearsindicatingthattherevisionwasreverted.ClickontotheHDFSStatuspageandnoticethe“StaleConfiguration-Restartneeded”icon.

Clickontheiconandfollowthestepstorestartthecluster.

Examining Hadoop Daemon Log Files

InthepreviousExercise,youreviewedtheapplicationlogsfromMapReduceandSparkrunningasYARNapplications.HereyouwillreviewHadoopdaemonlogfiles,includingtheHDFSNameNodeandYARNResourceManagerlogfiles.

WithClouderaManager,Hadoopdaemonsgeneratea.log.outfile,astandarderrorlog(stderr.log),andastandardoutputlog(stdout.log).

Inthistask,youwillexamineHadoopdaemonlogfilesusingtheNameNodeWebUI,ClouderaManager,theResourceManagerWebUIandtheNodeManagerWebUI.

1. ViewtheNameNodelogfileusingtheNameNodeWebUI.

Page 53: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

53

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

53

AccesstheNameNodeWebUIfromtheQuickLinksontheHDFSStatuspageinClouderaManager.

FromtheNameNodeWebUI,selectUtilities>Logs.Thelistoffoldersandfilesinthe/var/log/hadoop-hdfsdirectoryonelephantappears.

OpentheNameNodelogfileandreviewthefile.

2. AccessthedaemonlogsdirectlyfromClouderaManager.

InClouderaManager,chooseHostsfromthetopnavigationbar.

SelectelephantandthenchoosetheProcessestab.

LocatetherowfortheNameNodeandclick“Fulllogfile.”

TheLogDetailspageopensatthetailendoftheNameNodelogfile.Scrolluptoviewearlierlogmessages.

Note:Ifthelogfileisverylarge,andyouwanttoseemessagesnearthetop,scrollingintheClouderaManagerUIwillbeslow.Othertoolsprovidequickaccesstotheentirelogfile.

Click“DownloadFullLog”(intheupperrightcorneroftheLogDetailspage)todownloadtheentirelog.

3. ReviewtheNameNodedaemonsstandarderrorandstandardoutputlogsusingClouderaManager.

ReturntotheProcessespageforelephant.

ClicktheStdoutlinkfortheNameNodeinstance.Thestandardoutputlogappears.Reviewthefile,thenreturntotheProcessespage.

ClicktheStderrlinkfortheNameNodeinstance.Thestandarderrorlogappears.Reviewthefile.

Note:ifyouwanttolocatetheselogfilesondisk,theycanbefoundonelephantinthe/var/log/hadoop-hdfsand/var/run/cloudera-scm-agent/process/nn-hdfs-NAMENODE/logsdirectories.

4. UsingClouderaManager,reviewrecententriesintheSecondaryNameNodelogs.

Page 54: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

54

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

54

Tofindthelog,gototheHDFSInstancespageforyourcluster,thenclickontheSecondaryNameNoderoletype,andinStatuspageforthetigerhost,click“LogFile”.

5. AccesstheResourceManagerlogfileusingtheResourceManagerWebUI.

StarttheResourceManagerWebUI(fromClouderaManager’sYARNStatuspageorbyspecifyingtheURLhttp://horse:8088inyourbrowser).

Choose“Nodes”fromtheClustermenuontheleftsideofthepage.

Clickthehorse:8042linktobetakentotheNodeManagerWebUI.

ExpandtheToolsmenuontheleftsideofthepage.

Click“Locallogs.”

Finally,clicktheentryfortheResourceManagerlogfileandreviewthefile.

Page 55: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

55

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

55

Hands-On Exercise: Using Flume to Put Data into HDFS InthisHands-OnExerciseyouwilluseFlumetoimportdynamicallygenerateddataintoHDFS.AverycommonusecaseforFlumeistocollectaccesslogsfromalargenumberofWebserversonyournetwork;wewillsimulateasimpleversionofthisinthefollowingexercise.

ThediagrambelowshowsthedataflowthatwilloccuronceyoucompletetheExercise.

Page 56: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

56

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

56

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Adding the Flume Service

1. AddtheFlumeserviceonelephant and horse.

FromtheClouderaManagerHomepage,clickthedownarrownexttoyourclusterandchoose“AddaService”.

Select“Flume”andclick“Continue”.NotethatHDFSisadependency,howevertheHDFSserviceisalreadyadded.Click“Continue”again.

Usethe“Selecthosts”buttontoaddtheFlumeAgentonbothelephantand horsethenclickOKandContinue.AttheCongratulationsscreenclick“Finish”.

Note:youmaynoticethattherearenowtwonew‘Hosts’configurationissues,thistimeonelephantandhorsethat-liketheonethatappearedonmonkeyearlier-arerelatedtomemoryovercommitvalidationthresholds.Inatrueproductioncluster,youwouldwanttoaddressallconfigurationissues,howeveryoucansafelyignoretheseintheclassroomenvironment.

2. UpdatetheconfigurationfortheFlumeagentonelephant.

Page 57: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

57

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

57

FromtheClouderaManagerHomepage,clicktheFlumelinkandthenclickontheInstancestab.

Selectthe“Agent”thatresidesonelephant,thenclickConfiguration.

Deletethedefaultcontentsofthetwopropertieslistedinthetablebelowentirelyandreplacewiththelinesshownbelow.

Note:theConfigurationFilelinesarealsoavailablein~/training_materials/admin/scripts/flume-tail1.txt

Tip:YoucanexpandtheConfigurationFiletextareatomakeiteasiertoeditinbydraggingitoutthebottomrightcornerofthetextbox.

Property Value

AgentName tail1 ConfigurationFile

tail1.sources = src1 tail1.channels = ch1 tail1.sinks = sink1 tail1.sources.src1.type = exec tail1.sources.src1.command = tail -F /tmp/access_log tail1.sources.src1.channels = ch1 tail1.channels.ch1.type = memory tail1.channels.ch1.capacity = 500 tail1.sinks.sink1.type = avro tail1.sinks.sink1.hostname = horse tail1.sinks.sink1.port = 6000 tail1.sinks.sink1.batch-size = 1 tail1.sinks.sink1.channel = ch1

ClickSaveChanges.

3. UpdatetheconfigurationfortheFlumeagentonhorse.

FromtheClouderaManagerFlumepage,clickontheInstancestab.

SelecttheAgentthatresidesonhorse,thenclickConfiguration.

Deletethedefaultcontentsofthetwopropertieslistedinthetablebelowentirelyandreplacewiththelinesshownbelow.

Page 58: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

58

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

58

Note:theConfigurationFilelinesarealsoavailablein~/training_materials/admin/scripts/flume-collector1.txt

Property Value

AgentName collector1

ConfigurationFile

collector1.sources = src1 collector1.channels = ch1 collector1.sinks = sink1 collector1.sources.src1.type = avro collector1.sources.src1.bind = horse collector1.sources.src1.port = 6000 collector1.sources.src1.channels = ch1 collector1.channels.ch1.type = memory collector1.channels.ch1.capacity = 500 collector1.sinks.sink1.type = hdfs collector1.sinks.sink1.hdfs.path = hdfs://elephant/user/flume/collector1 collector1.sinks.sink1.hdfs.filePrefix = access_log collector1.sinks.sink1.channel = ch1

Ensurethat“collector1.sinks.sink1.hdfs.path=hdfs://elephant/user/flume/collector1”shownaboveisallonasingleline.

Click“Savechanges.”

Page 59: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

59

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

59

4. Createthe/user/flume/collector1directoryinHDFStostorethefiles.

Onelephant:

$ sudo -u hdfs hdfs dfs -mkdir -p \

/user/flume/collector1

$ sudo -u hdfs hdfs dfs -chown -R flume /user/flume

Starting the Data Generator

1. Openanewterminalwindowonelephant(oransshconnectiontoelephant).Inthisterminalwindow,runtheaccesslog-gen.bashshellscript,whichsimulatesaWebservercreatinglogfiles.Thisshellscriptalsorotatesthelogfilesregularly.

Onelephant:

$ accesslog-gen.sh /tmp/access_log

Note:Theaccesslog-gen.bashscriptisspecifictothetrainingenvironmentandisnotpartofCDH.

2. Openasecondnewterminalwindowonelephant(oransshconnectiontoelephant).Verifythatthelogfilehasbeencreated.Noticethatthelogfileisrotatedperiodically.

Onelephant:

$ ls -l /tmp/access*

-rw-rw-r-- 1 training training 498 Nov 15 15:12 /tmp/access_log

-rw-rw-r-- 1 training training 997 Nov 15 15:12 /tmp/access_log.0

-rw-rw-r-- 1 training training 1005 Nov 15 15:11 /tmp/access_log.1

Page 60: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

60

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

60

Starting the Flume Collector Agent

HereyoustarttheFlumeagentthatwillinsertthedataintoHDFS.ThisagentreceivesdatafromthesourceFlumeagent.

1. Startthecollector1FlumeAgentonhorse.

InClouderaManager,gotoFlume’sInstancestab.SelecttheAgenthostedonhorse.

Fromthe“ActionsforSelected”menuchooseStart.Inthe“Start”windowthatappears,click“Start”.

Inthe“CommandDetails:Start”screenwaitforconfirmationthattheagentstartedsuccessfully.ClickClose.

Starting the Flume Source Agent

Hereyoustarttheagentthatreadsthesourcelogfilesandpassesthedataalongtothecollectoragentyouhavealreadystarted.

1. Startthetail1FlumeAgentonelephant.

FromtheClouderaManagerFlumeInstancestab,selecttheAgenthostedonelephant.

Fromthe“ActionsforSelected”menuchooseStart.Inthe“Start”windowthatappears,click“Start”.

Inthe“CommandDetails:Start”screenwaitforconfirmationthattheagentstartedsuccessfully.ClickClose.

Viewing Data in HDFS

1. ConfirmthedataisbeingwrittenintoHDFS.

InClouderaManagerbrowsetotheHDFSpageforyourclusterandclickonFileBrowser.

Drilldowninto/user/flume/collector1.Youshouldseemanyaccess_logfiles.

Page 61: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

61

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

61

2. ViewMetricDetails.

ReturntotheFlumepageandclickontheMetricDetailstab.HereyoucanseedetailsrelatedtotheChannels,Sinks,andSourcesofyourrunningFlumeagents.Ifyouareinterested,anexplanationofthemetricsavailableisathttp://bit.ly/flumemetrics.

Increase the File Size in HDFS (Optional)

Thesetwostepsareoptional,butmaybeofinteresttosomestudents.

1. EdittheCollector1agentconfigurationsettingsonhorsebyaddingthesethreeadditionalnamevaluepairs:

collector1.sinks.sink1.hdfs.rollSize = 2048

collector1.sinks.sink1.hdfs.rollCount = 100

collector1.sinks.sink1.hdfs.rollInterval = 60

ClickSaveChanges.

2. FromtheFlumeStatuspage,clickonthe StaleConfiguration-refreshneedediconandfollowthepromptstorefreshthecluster.

3. Executehdfs dfs -ls /user/flume/collector1inaterminalwindowandnotethefilesizeofthemorerecentcontentpostedbyFlumetoHDFS.

Viewing the Logs

1. Checkthelogfilestoseemessages.

InClouderaManagerchooseDiagnostics>Logs.

ClickSelectSourcesandconfigureasfollows:• UncheckallsourcesexceptFlume• SettheMinimumLogLeveltoINFO• Leavethetimeframeofyoursearchsetto30minutes

ClickSearch.

Page 62: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

62

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

62

BrowsethroughtheloggedactionsfrombothFlumeagents.

Cleaning Up

1. StoptheloggeneratorbyhittingCtrl-Cinthefirstterminalwindow.

2. StopbothFlumeagentsfromtheFlumeInstancespageinClouderaManager.

3. Removethegeneratedaccesslogfilesfromthe/tmpdirectorytoclearupspaceonyourvirtualmachine.

Onelephant:

$ rm -rf /tmp/access_log*

This is the end of the Exercise.

Page 63: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

63

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

63

Hands-On Exercise: Importing Data with Sqoop ForthisexerciseyouwillimportdatafromarelationaldatabaseusingSqoop.Thedatayouloadherewillbeusedinasubsequentexercise.

ConsidertheMySQLdatabasemovielens,derivedfromtheMovieLensprojectfromUniversityofMinnesota.(Seenoteattheendofthisexercise.)Thedatabaseconsistsofseveralrelatedtables,butwewillimportonlytwoofthese:movie,whichcontainsabout3,900movies;andmovierating,whichhasabout1,000,000ratingsofthosemovies.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Performallstepsinthisexerciseonelephant.

Reviewing the Database Tables

First,reviewthedatabasetablestobeloadedintoHadoop.

1. LogontoMySQL.

Onelephant:

$ mysql --user=training --password=training movielens

2. Reviewthestructureandcontentsofthemovietable.

Onelephant:

Page 64: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

64

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

64

mysql> DESCRIBE movie;

. . .

mysql> SELECT * FROM movie LIMIT 5;

3. Notethecolumnnamesforthetable.

____________________________________________________________________________________________

4. Reviewthestructureandcontentsofthemovieratingtable.

Onelephant:

mysql> DESCRIBE movierating;

. . .

mysql> SELECT * FROM movierating LIMIT 5;

5. Notethesecolumnnames.

____________________________________________________________________________________________

6. ExitMySQL.

Onelephant:

mysql> quit;

Adding the Sqoop 1 Client

1. AddtheSqoop1Clientgatewayonelephant.

FromtheHomepageinClouderaManager,clickthedownarrowiconnexttoCluster1andchoose“AddaService”.

Select Sqoop 1 Client andclickContinue.

Atthe“CustomRoleAssignments”page,clickontheSelecthostsboxandchoosetoaddtheGatewayonelephant.ClickContinue.

Page 65: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

65

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

65

The“Progress”pageappears.Oncetheclientconfigurationhasbeendeployedsuccessfully,clickContinue.Atthe“Congratulations”screenclickFinish.

2. Usingsudo,createasymlinktotheMySQLJDBCdriver.

Onelephant:

$ sudo ln -s /usr/share/java/mysql-connector-java.jar \

/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10\

/lib/sqoop/lib/

Nowrunthecommandbelowtoconfirmthesymlinkwasproperlycreated.

Onelephant:

$ readlink -f /opt/cloudera/parcels/CDH-5.3.2-\

1.cdh5.3.2.p0.10/lib/sqoop/lib/mysql-connector-java.jar

Ifthesymlinkwasproperlydefined,thecommandshouldreturnthe/usr/share/java/mysql-connector-java.jarpath.

Importing with Sqoop

YouinvokeSqooponthecommandlinetoperformseveralcommands.Withityoucanconnecttoyourdatabaseservertolistthedatabases(schemas)towhichyouhaveaccess,andlistthetablesavailableforloading.Fordatabaseaccess,youprovideaconnectstringtoidentifytheserver,andyourusernameandpassword.

1. ShowthecommandsavailableinSqoop.

Onelephant:

$ sqoop help

YoucansafelyignorethewarningthatAccumulodoesnotexistsincethiscoursedoesnotuseAccumulo.

2. Listthedatabases(schemas)inyourdatabaseserver.

Page 66: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

66

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

66

Onelephant:

$ sqoop list-databases \

--connect jdbc:mysql://localhost \

--username training --password training

(Note:Insteadofentering--password trainingonyourcommandline,youmayprefertoenter-P,andletSqooppromptyouforthepassword,whichisthennotvisiblewhenyoutypeit.)

3. Listthetablesinthemovielensdatabase.

Onelephant:

$ sqoop list-tables \

--connect jdbc:mysql://localhost/movielens \

--username training --password training

4. ImportthemovietableintoHadoop.

Onelephant:

$ sqoop import \

--connect jdbc:mysql://localhost/movielens \

--table movie --fields-terminated-by '\t' \

--username training --password training

The--fields-terminated-by '\t'optionseparatesthefieldsintheHDFSfilewiththetabcharacter,whichissometimesusefulifuserswillbeworkingwithHiveandPig.

Warningsthatpackagessuchashbase,hive-hcatalog,andaccumuloarenotinstalledareexpected.Itisnotaproblemthatthesepackagesarenotinstalledonyoursystem.

NoticehowtheINFOmessagesthatappearshowthataMapReducejobconsistingoffourmaptaskswascompleted.

Page 67: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

67

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

67

5. Verifythatthecommandhasworked.

Onelephant:

$ hdfs dfs -ls movie

$ hdfs dfs -tail movie/part-m-00000

6. ImportthemovieratingtableintoHadoopusingthecommandinstep4asanexample.

Verifythatthemovieratingtablewasimportedusingthecommandinstep5asanexampleorbyusingtheClouderaManagerHDFSpage’sFileBrowser.

7. OptionallyobservetheresultsinClouderaManager’sYARNApplicationspage.

NavigatetotheYARNApplicationspage.

NoticethelasttwoYARNapplicationsthatran(movie.jarandmovierating.jar).

Explorethejobdetailsforeitherorbothofthesejobs.

This is the end of the Exercise

Note:

This exercise uses the MovieLens data set, or subsets thereof. This data is

freely available for academic purposes, and is used and distributed by

Cloudera with the express permission of the UMN GroupLens Research

Group. If you would like to use this data for your own research purposes,

you are free to do so, as long as you cite the GroupLens Research Group in

any resulting publications. If you would like to use this data for commercial

purposes, you must obtain explicit permission. You may find the full dataset,

as well as detailed license terms, at http://www.grouplens.org/node/73

Page 68: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

68

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

68

Hands-On Exercise: Querying HDFS With Hive and Cloudera Impala Inthisexercise,youwilladdHiveandClouderaImpalaservicestoyourcluster,enablingyoutoquerydatastoredinHDFS.

YouwillstartbyaddingtheZooKeeperservicetoyourcluster.ZooKeeperisaprerequisiteforHiveServer2,whichyouwilldeploywhenyouaddtheHiveservice.

ThenyouwilladdtheHiveservice,includingaHiveMetastoreServerandHiveServer2,toyourHadoopcluster,andconfiguretheservice.

Next,youwilladdtheImpalaservicetoyourclusterandconfiguretheservice.

ThenyouwillpopulateHDFSwithdatafromthemovieratingtableandrunqueriesagainstitusingbothHiveandImpala.

Page 69: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

69

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

69

Attheendofthisexercise,youshouldhavedaemonsdeployedonyourfivehostsasfollows(newservicesaddedinthisexercisearehighlightedinblue):

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Adding the ZooKeeper Service

Inthistask,youwilladdaZooKeeperservicetoyourcluster.ArunningZooKeeperserviceisaprerequisiteformanyotherservicessoyouwilladdthisservicenow.Whenyouaddadditionalserviceslaterintheclass,youmaynoticetheExercise

Page 70: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

70

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

70

instructionshaveyouselectZooKeeperaspartofthesetofdependenciesforthenewservicetouse.

1. FromtheClouderaManagerHomepage,selectthe‘AddaService’menuoptionfromthedrop-downmenutotherightofCluster1.

TheAddServiceWizardappears.

2. SelectZooKeeperandclickContinue.

TheCustomizeRoleAssignmentspageappears.

3. Specifythefollowinghostassignments:

• Server–elephant,horse,andtigerbutnotlionormonkey

ClickOKandthenclickContinue.

4. TheReviewChangespageappears.

ReviewthedefaultvaluesspecifiedonthispageandclickContinue.

5. ProgressmessagesappearontheProgresspage.

Whentheaddingoftheservicehascompleted,clickContinue.

6. TheCongratulationspageappears.

ClickFinish.

7. TheClouderaManagerHomepageappears.

Thezookeeperservicenowappearsinthelistofservices.

AhealthissueiconmayappearnexttothenewZooKeeperservicetemporarily,howeverthisshouldgoawaymomentarilyandthestatusshouldchangetoGoodHealth.

Page 71: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

71

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

71

Note:youmaynoticethatthereisnowonemore‘Hosts’configurationissues,thistimeontigerthat-liketheonesthatappearedearlier-arerelatedtomemoryovercommitvalidationthresholds.Inatrueproductioncluster,youwouldwanttoaddressallconfigurationissues,howeveryoucansafelyignoretheseintheclassroomenvironment.

Adding the Hive Service to Your Cluster

Inthistask,youwilladdtheHiveservicetoyourclusterusingClouderaManager.

YouwillconfiguretheHiveMetastoretousetheMySQLmetastoredatabaseandtohaveaHiveServer2instance.

HiveandImpalacanbothmakeuseofasingle,commonHivemetastore.RecallthatyoucreatedafewdatabasesbyrunningaSQLscriptpriortoinstallingyourCDHcluster.Oneofthedatabasesyoucreatedisnamedmetastore,whichwillbeusedbyHiveandImpalaasacommonmetastoreforstoringtabledefinitions.

Attheendofthetask,youwillrunasimpleHivecommandtoverifythattheHiveservicehasbeenadded.

1. FromtheClouderaManagerHomepage,selectthe“AddaService”optionforCluster1.

TheAddServiceWizardappears.

2. SelectHiveandclickContinue.

The“Selectthesetofdependenciesforyournewservice”pageappears.

3. Selecttherowcontainingthehdfs,yarn,andzookeeperservices,thenclickContinue.

TheCustomizeRoleAssignmentspageappears.

4. Specifyhostassignmentsasfollows:

Page 72: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

72

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

72

• Gateway–elephantonly• HiveMetastoreServer–elephantonly• WebHCatServer–Donotselectanyhosts• HiveServer2–elephantonly

VerifythatyouhaveselectedonlyelephantandnotanyadditionalhostsfortheGateway,HiveMetastoreServer,andHiveServer2roles.

ClickContinue.

5. The“DatabaseSetup”pageappears.

Specifyvaluesforthedatabaseasfollows:

• DatabaseHostName–lion • DatabaseType–MySQL• DatabaseName–metastore• UserName–hiveuser• Password–password

Click“TestConnection”andverifythatconnectiontotheMySQLdatabaseissuccessful.

ClickContinue.

6. TheReviewChangespageappears.

ReviewthedefaultvaluesspecifiedonthispageandclickContinue.

7. ProgressmessagesappearontheProgresspage.

Whentheaddingoftheservicehascompleted,clickContinue.

8. TheCongratulationspageappears.

ClickFinish.

9. TheHomepageappears.

Astatusindicatorshowsyouthatthehiveserviceisingoodhealth.

Page 73: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

73

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

73

10. VerifythatyoucanrunaHivecommandfromtheBeelineshell.

Onelephant:

$ beeline -u jdbc:hive2://elephant:10000/default \

-n training

IntheBeelineshell,typethefollowingcommand:

> SHOW TABLES;

Notablesshouldappear,becauseyouhaven’tdefinedanyHivetablesyet,butyoushouldnotseeerrormessages.

11. ExittheBeelineshell.

> !quit

Adding the Impala Service

Inthistask,youwilladdtheImpalaservicetoyourcluster.

ClouderaManagerwillautomaticallyconfigureImpalatousetheHiveMetastoreservicethatyoucreatedearlierinthisexercise.

1. FromtheClouderaManagerHomepage,selecttheAddaServiceoptionforCluster1.

TheAddServiceWizardappears.

2. SelectImpalaandclickContinue.

The“Selectthesetofdependenciesforyournewservice”pageappears.

3. Selecttherowcontainingthehdfsandhiveservices,thenclickContinue.

TheCustomizeRoleAssignmentspageappears.

Page 74: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

74

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

74

4. Specifyhostassignmentsasfollows:

• ImpalaCatalogServer–horse• ImpalaStateStore–horse• ImpalaDaemon–elephant,horse,monkeyandtiger

ClickContinue.

Note:WhenyouaddedtheHiveservice,youspecifiedelephantasaGatewayhost,whichcausedtheHiveclienttobeaddedonelephant.WithImpala,theImpalaclient—impala-shell—isautomaticallyaddedonallhostsrunningImpala.

5. TheReviewChangespageappears.

ReviewthedefaultvaluesspecifiedfortheImpalaDaemonScratchDirectoriesonthispageandclickContinue.

6. ProgressmessagesappearontheProgresspage.

Whentheaddingoftheservicehascompleted,clickContinue.

7. TheCongratulationspageappears.

ClickFinish.

8. RestarttheHDFSservice.

AfteraddingImpala,ontheClouderaManagerhomepageyouwillnoticethattheHDFSservicehasstaleconfigurationsasindicatedbythe iconthatappears.

Clickonthe“StaleConfiguration:Restartneeded”icon.The“ReviewChanges”pageappears.

Click“RestartCluster”thenclick“RestartNow”.WhentheactioncompletesclickFinish.

9. ConfirmtheImpalaservicesstarted.

BrowsetotheImpalapageforyourclusterandclickonInstances.

Page 75: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

75

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

75

YoushouldseethattheImpalaCatalogServer,ImpalaDaemons,andImpalaStateStoreserviceshaveallstartedsuccessfully.

Running Hive and Impala Queries

Inthistask,youwilldefinethemovieratingtableinHiveandrunasimplequeryagainstthetable.ThenyouwillrunthesamequeryinImpalaandcompareperformance.

Note:YoualreadyimportedthemovieratingtableintoHDFSintheImportingDataWithSqoopexercise.

1. ReviewthemovieratingtabledataimportedintoHDFSduringtheSqoopexercise.

Onelephant:

$ hdfs dfs -cat movierating/part-m-00000 | head

2. StarttheBeelineshellandconnecttoHiveServer2.

Onelephant:

$ beeline -u jdbc:hive2://elephant:10000 -n training

3. DefinethemovieratingtableinHive.

> CREATE EXTERNAL TABLE movierating

> (userid INT, movieid STRING, rating TINYINT)

> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

> LOCATION '/user/training/movierating';

4. VerifythatyoucreatedthemovieratingtableintheHivemetastore.

Onelephant:

Page 76: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

76

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

76

> SHOW TABLES;

Youshouldseeanentryforthemovieratingtable.

5. RunasimpleHivetestquerythatcountsthenumberofrowsinthemovieratings table.

Onelephant:

> SELECT COUNT(*) FROM movierating;

BrowsetoClouderaManager’sYARNApplicationspagetoviewtheMapReducejobthatwasrunwhenyouexecutedtheHivequery.IntheResultstab,makenoteoftheamountoftimethequerytakestoexecutewhenrunninginHive.

6. TerminatetheBeelineshell.

Onelephant:

> !quit

7. StarttheImpalashell.

Onelephant:

$ impala-shell

8. ConnecttotheImpalaCatalogServerrunningonhorse.

> CONNECT horse;

Page 77: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

77

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

77

9. SinceyoudefinedanewtableafterstartingtheImpalaserveronhorse,youmustnowrefreshthatserver’scopyoftheHivemetadata.

> INVALIDATE METADATA;

10. InImpala,runthesamequeryagainstthemovieratingtablethatyouraninHive.

> SELECT COUNT(*) FROM movierating;

ComparetheamountoftimeittooktorunthequeryinImpalatotheamountoftimeittookinHive.

11. ExittheImpalashell.

! quit;

12. ExploreImpalaqueriesinClouderaManager

InClouderaManager,fromtheClustersmenu’sActivitiessection,chooseImpalaQueries.

Noticethatbothofthequeriesyouranfromtheimpala-shellappearintheResultspanel.

Forthe‘SELECT’querythatyouran,choose“QueryDetails”fromthedropdownmenuontheright.Browsethroughthequerydetailsnotingtheinformationaboutthequerythatisavailabletoyou.

This is the end of the Exercise.

Page 78: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

78

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

78

Hands-On Exercise: Using Hue to Control Hadoop User Access Inthisexercise,youwillconfigureaHueenvironmentthatprovidesbusinessanalystswiththefollowingcapabilities:

• SubmittingPig,Hive,andImpalaqueries• ManagingdefinitionsintheHiveMetastore• BrowsingtheHDFSfilesystem• BrowsingYARNapplications

UserswillbeabletoaccesstheirenvironmentsbyusingaWebbrowser,eliminatingtheneedforadministratorstoinstallHadoopclientenvironmentsontheanalysts’systems.

Page 79: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

79

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

79

Attheendofthisexercise,youshouldhavedaemonsdeployedonyourfivehostsasfollows(newdaemonsshowninblue):

TheHueserverwillbedeployedonmonkey.TheHttpFSandOozieserversonmonkeywillsupportseveralHueapplications.

Page 80: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

80

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

80

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Adding an HttpFS Role Instance to the hdfs Service

Inthistask,youwillusetheClouderaManagerwizardtoaddanHttpFSroleinstancetothehdfsservice.TheHttpFSroleinstancewillresideonmonkey.Afteraddingtheroleinstance,youwillrunacurlcommandfromthecommandlinetoverifythatHttpFSworkscorrectly.

1. InClouderaManager,navigatetotheHDFSInstancespage.

2. ClickAddRoleInstances.

The“AddRoleInstancestoHDFS”pageappears.

3. ForHttpFS,specifymonkey.

ClickContinue.

4. ThehdfsRoleInstancespagereappears.TheHttpFS(monkey)roleinstancenowappearsinthelistofroleinstances.

NoticethatthestatusforthisroleinstanceisStopped.

AstatusindicatorshowsyouthattheHDFSservicehasgoodhealth.

5. StarttheHttpFSroleinstance.

IntheHDFSInstancespage,checktheboxnexttoHttpFsandfromtheActionsforSelectedmenuchooseStart.

IntheStartdialogwindowclickStart.WhentheCommandDetailswindowshowsthatthecommandcompletedsuccessfullyclickClose.

Page 81: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

81

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

81

6. ToverifyHttpFSoperation,runtheHttpFSLISTSTATUSoperationtoexaminethecontentinthe/user/trainingdirectoryinHDFS.

Onelephant:

$ ssh training@monkey netstat -tan | grep :14000

$ curl -s "http://monkey:14000/webhdfs/v1/\

user/training?op=LISTSTATUS&user.name=training" \

| python -m json.tool

Note:TheHttpFSRESTAPIreturnsJSONobjects.PipingtheJSONobjectstopython -m json.toolmakestheobjectseasiertoreadinstandardoutput.

Adding the Oozie Service

Inthistask,youwilladdtheOozieservicetoyourcluster.WithClouderaManager,OozieisaprerequisitesforaddingtheHueservice.Wewon’tusetheseservicesintheseexercises,butfeelfreetoexplorethemonyourownifyoulike.

YouwillconfiguretheOozieinstancetoresideonmonkeyandtheSolrinstancetorunontiger.

1. InClouderaManager,navigatetotheHomepage.

2. SelecttheAddaServiceoptionforCluster1.

TheAddServiceWizardappears.

3. SelectOozieandclickContinue.

The“Selectthesetofdependenciesforyournewservice”pageappears.

4. Selecttherowcontainingthehdfs,yarn,andzookeeperservices,thenclickContinue.

TheCustomizeRoleAssignmentspageappears.

Page 82: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

82

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

82

5. Specifyhostassignmentsasfollows:

• OozieServer–monkey

ClickContinue.

6. TheReviewChangespageappears.

ReviewthedefaultvaluesspecifiedonthispageandclickContinue.

7. ProgressmessagesappearontheProgresspage.

Whentheaddingoftheservicehascompleted,clickContinue.

8. TheCongratulationspageappears.

ClickFinish.

9. TheHomepageappears.

Astatusindicatorshowsyouthattheoozieserviceisingoodhealth.

Adding the Hue Service

Inthistask,youwilladdaHueservicetoyourcluster,configuringtheHueinstancetorunonmonkey.

1. FromtheClouderaManagerHomepage,selecttheAddaServiceoptionforCluster1.

TheAddServiceWizardappears.

2. SelectHueandclickContinue.

The“Selectthesetofdependenciesforyournewservice”pageappears.

3. Selecttherowcontainingthehdfs,hive,impala,oozie,yarn,andzookeeperservices,thenclickContinue.

TheCustomizeRoleAssignmentspageappears.

Page 83: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

83

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

83

4. Specifyhostassignmentsasfollows:

• HueServer–monkey

ClickContinue.

5. ProgressmessagesappearontheProgresspage.

Whentheaddingoftheservicehascompleted,clickContinue.

6. TheCongratulationspageappears.

ClickFinish.

7. TheHomepageappears.

Astatusindicatorshowsyouthatthehueserviceisingoodhealth.

8. SubmitaHadoopWordCountjobsothattherewillbeaMapReducejobentrythatyoucanbrowseinHueafteryoustarttheHueUI.

Onelephant:

$ hadoop jar /opt/cloudera/parcels/CDH-5.3.2-1.cdh\

5.3.2.p0.10/lib/hadoop-mapreduce/hadoop-mapreduce-\

examples.jar wordcount /tmp/shakespeare.txt test_output

9. InstallSparkclientconfigurationonelephant.

PreviouslyyourantheSparkshellonmonkey.HereyouwilladdtheSparkGatewayroleonelephantsothatyoucanruntheSparkshellfromelephant.

NavigatetotheSparkInstancespageforyourclusterandclickAddRoleInstances.

AddtheGatewayroletoelephant.

Clickonthe“Staleconfiguration-clientconfigurationredeploymentneeded”iconandinthe“Cluster1StaleConfigurations”pageclick“DeployClient

Configuration”.

Page 84: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

84

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

84

IntheDeployClientConfigurationpageclick“DeployClientConfiguration”.

IntheProgressscreen,waitforthecommandstocomplete,thenclickFinish.

10. StarttheSparkShellsothattherewillbeaSparkjobentryinHue.

Onelephant:

$ spark-shell --master yarn-client

Leavethespark-shellopenintheterminalfortherestofthisexercise.

Exploring the Hue User Interface

Inthistask,youwilllogintotheHueUIasanadministrativeuserandbrieflyexplorethefollowingapplications:HueHomepage,HiveUI,ClouderaImpalaQueryUI,PigEditor,FileBrowser,MetastoreManager,JobBrowser,HueShell,UserAdmin,andHelp.

YouwillalsoexplorehowtheHueUIreportsmisconfigurationanddeterminewhyyoucannotusetheJobDesignerandOozieEditor/Dashboardapplications.

LoggingIntoHue

1. MaximizethebrowserwindowtogiveHueenoughspacetodisplayasmanyoptionsaspossibleonitstopmenu.

2. ViewtheHueUI.

Accessthe“HueWebUI”fromtheClouderaManagerHueStatuspageforyourclusterorjustbrowsetotheURLathttp://monkey:8888.

3. LogintoHue.

Note:thatasthemessageboxinthebrowserindicates,asthefirstpersontologintothisHueservice,youareactuallydefiningtheHuesuperusercredentialsinthisstep.

Page 85: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

85

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

85

Typeinadminastheuser,withthepasswordtraining,thenclick“CreateAccount”.

4. TheQuickStartWizarddisplays.

ClicktheHomeicon.

Atthe“Didyouknow?”dialog,click“Gotit,prof!”

The“Mydocuments”pageappears.

AccessHiveUsingHue

1. IfyoucompletedtheQueryingHDFSWithHiveandClouderaImpalaexercise,starttheHiveQueryEditorbyselectingQueryEditors>Hive.

EnterthefollowingquerytoverifythatHiveisworkinginHue:

SHOW TABLES;

ClickExecute.Theresultofthequeryshouldbethemovieratingtable.

Enteranotherquerytocountthenumberofrecordsinthemovieratingtable:

SELECT COUNT(*) FROM movierating;

Giveitsometimetocomplete.TheUIwillfirstshowtheLogtabcontents,thenitshouldeventuallyshowtheResultstab.Thequeryshouldrunsuccessfully.

Page 86: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

86

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

86

AccessImpalaUsingHue

1. IfyoucompletedtheQueryingHDFSWithHiveandClouderaImpalaexercise,starttheImpalaQueryEditorbyselectingQueryEditors>Impala.

EnterthefollowingquerytoverifythatImpalaisworkinginHue:

SHOW TABLES;

ClickExecute.Theresultofthequeryshouldbethemovieratingtable.

Enteranotherquerytocountthenumberofrecordsinthemovieratingtable:

SELECT COUNT(*) FROM movierating;

Thequeryshouldrunsuccessfully.

AccesstheMetastoreUsingHue

1. StarttheMetastoreManagerbyselectingDataBrowsers>MetastoreTables.

TheMetastoreManagerappearswithanentryforthemovieratingtable.

Selecttheentryforthemovieratingtable.

TheschemafortheHivemovieratingtable,whichyoucreatedintheHiveexercise,appearsintheMetastoreManager.

NoticethelistofactionsavailableintheACTIONSmenu.

AccessPigUsingHue

1. StartthePigUIbyselectingQueryEditors>Pig.

Page 87: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

87

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

87

ThePigQueryEditorappears.

YoucaneditandsavePigscriptsusingHue’sPigQueryEditorinyourcurrentHuedeployment.

AccessHDFSUsingHue

1. ClickontheFileBrowser(documenticon)towardsthetoprightoftheHueUI.

ThisopenstheFileBrowserapplication.

BrowsetheHDFSfilesystem.Ifyouwish,executesomehdfs dfscommandsfromthecommandlinetoverifythatyouobtainthesameresultsfromthecommandlineandtheHueFileBrowser.

NotetheUploadmenuaswell.YoucoulduploadfilestoHDFSthroughHue.

TherearealsoActionsavailablegivingtheHueuseroptionstoRename,Move,DownloadorchangepermissionsonHDFSfiles(assumingtheuserhasthepermissionstodoso).

2. IntheFileBrowser,navigatetothe/user/admindirectory.

OnthefirstHuelogin,Huecreatedasuperuser–inthiscase,theadminuser–andanHDFSpathforthatuser–inthiscase,the/user/adminpath.

3. IntheFileBrowser,navigatetothe/user/training/test_output directory–theoutputdirectoryoftheWordCountjobthatyouranthebeforestartingtheHueUI.

4. Clicktheentryforthepart-r-00000file–theoutputfilefromtheWordCountjob.

Aread-onlyeditorshowingthecontentsofthepart-r-00000fileappears.

Page 88: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

88

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

88

BrowseYARNApplicationsUsingHue

1. SelecttheJobBrowser(listicon)option.

AnentryfortheHivejobthatyouranearlierappearsintheHueJobBrowser.

SpecifytrainingintheUsernamefield.

AnentryfortheMapReducewordcountjobyouraninthepreviousstepappearswiththestatus“SUCCEEDED.”

AnotherentryfortheSparkshellyoushouldstillhaverunningappearswiththestatus“RUNNING.”

Browsethecompleted“wordcount”jobdetailsbyclickingonthelinkintheIDcolumnandthenlookingthroughthedetailsintheAttempts,Tasks,Metadata,andCounterstabs.

Ifyouareinterested,lookinClouderaManager’sYARNApplicationspageforyourcluster,locatetheentryforthesamewordcountjob,andfollowthe“ApplicationDetails”linkwhichtakesyoutotheHistoryServerWebUI.ComparethedetailsyoufindtherewiththeinformationavailableintheHueJobBrowser.

2. Backintheterminalwindowonelephant,typeexittoendtheSparkinteractiveshell.

BrowseUsers,Documentation,SettingsandLogsofHue

Page 89: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

89

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

89

1. StarttheUserManagementToolbyselectingthe”admin”menu(cogandwheelsicon)andthenchoosing“ManageUsers.”

TheUserAdminscreenappearswhereyoucandefineHueusersandgroupsandsetpermissions.

Noticetheautomaticallycreatedentryfortheadminuser.

Youwillcreateanotheruserandagroupinthenexttask.

2. ClicktheDocumentation(questionmark)icon:

HueUIuserdocumentationappears.

3. ClicktheAboutHueicon(totheleftoftheHomeicon).

TheQuickStartWizard’sCheckConfigurationtabshows“AllOK.Configurationcheckpassed.”

ChoosetheAboutHuetop-levelmenuandclickintotheConfigurationtabtoexamineHue’sconfiguration.

ClicktheServerLogstabtoexamineHue’slogs.

Setting up the Hue Environment for Business Analysts

ConsiderascenariowhereyouhavebeengivenarequirementtosetupaHueenvironmentforbusinessanalysts.TheenvironmentwillallowanalyststosubmitHiveandImpalaqueries,editandsavePigqueries,browseHDFS,managetabledefinitionsintheHiveMetastore,andbrowseHadoopjobs.AnalystswhohavethisenvironmentwillnotneedHadoopinstalledontheirsystems.Instead,theywillaccessalltheHadoopfunctionalitythattheyneedthroughaWebbrowser.

YouwillusetheUserAdminapplicationtosetuptheanalysts’Hueenvironment.

1. VerifythatyouarestillloggedintoHueastheadminuser.

2. ActivatetheHueUserManagementtoolbyselectingadmin>ManageUsers.

3. SelectGroups.

Page 90: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

90

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

90

4. ClickAddgroup,namethenewgroupanalysts.

Configurethepermissionsbyselectingtheoneslistedbelow:

• about.access • beeswax.access • filebrowser.access • hbase.access • help.access • impala.access • jobbrowser.access • metastore.write • metastore.access • pig.access

ClickAddgroup.

5. SelectUsers.

6. AddaHueusernamedfredwiththepasswordtraining.

Inthe“Step2:NamesandGroups”tab,makefredamemberoftheanalystsgroup.However,makesurethatfredisnotamemberofthedefaultgroup.

Click“AddUser”.

7. SignoutoftheHueUI(usingthearrowiconinthetoprightofthescreen).

8. LogbackintotheHueUIasuserfredwithpasswordtraining.

Verifythatinthesessionforfred,onlytheHueapplicationsconfiguredfortheanalystsgroupappear.Forexample,theAdministrationmenudoesnotallowfredtomanageusers.FredalsohasnoaccesstotheWorkflows,Search,andSecuritymenusthatareavailabletotheadminuser.

This is the end of the Exercise.

Page 91: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

91

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

91

Hands-On Exercise: Configuring HDFS High Availability Inthisexercise,youwillreconfigureHDFS,eliminatingtheNameNodeasasinglepointoffailureforyourHadoopcluster.

Youwillstartbymodifyingthehdfsservice’sconfigurationtoenableHDFShighavailability.

Youwillthenshutdownservicesthatyouwillnolongeruseinthisexerciseorothersubsequentexercises.

Next,youwillenableautomaticfailoverfortheNameNode.AutomaticfailoverusestheZooKeeperservicethatyouaddedtoyourclusterinanearlierexercise.ZooKeeperisaprerequisiteforHDFSHAautomaticfailover.

OnceyouhaveenabledHDFShighavailabilitywithautomaticfailover,youwillintentionallybringoneoftheserversdownasatest.HDFSservicesshouldstillbeavailable,butitwillbeservedbyaNameNoderunningonadifferenthost.

Page 92: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

92

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

92

Attheendofthisexercise,youshouldhavedaemonsdeployedandrunningonyourfivehostsasfollows(newdaemonsshowninblue):

Page 93: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

93

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

93

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Bringing Down Unneeded Services

SinceyouwillnolongeruseHue,Oozie,Impala,orHivefortheremainingexercises,youcanstoptheirservicestoimproveyourcluster’sperformance.

1. InClouderaManager,navigatetotheHomepage.

2. Stopthehueserviceasfollows:

Intherowforthehueservice,clickActions>Stop.

ClickStopintheconfirmationwindow.

ClickCloseaftermessagesintheCommandDetails:Stoppageindicatethatthehueservicehasstopped.

TheHomepagereappears.ThestatusofthehueserviceshouldhavechangedtoStopped.

3. Usingstepssimilartothestepsyoufollowedtostopthehueservice,stoptheoozieservice.

4. Stoptheimpalaservice.

5. Stopthehiveservice.

6. Stoptheflumeserviceifitisrunning.

Page 94: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

94

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

94

Verifythattheonlyservicesthatarestillupandrunningonyourclusterarethehdfs,yarn,spark,zookeeper,andmgmtservices.Alloftheseservicesshouldhavegoodhealth.

Enabling HDFS High Availability

InthistaskyouwillconfigureyourHadoopconfigurationtouseHDFShighavailability.

1. ConfiguredirectoriesforJournalNodestostoreeditsdirectories.

Onelephant:

$ sudo mkdir /dfs/jn

$ sudo chown hdfs:hadoop /dfs/jn

$ ssh training@horse sudo mkdir /dfs/jn

$ ssh training@horse sudo chown hdfs:hadoop /dfs/jn

$ ssh training@tiger sudo mkdir /dfs/jn

$ ssh training@tiger sudo chown hdfs:hadoop /dfs/jn

2. InClouderaManager,browseHDFSConfigurationpageforyourcluster.

SelecttheService-Widecategory.

ForZooKeeperService,clickintheValueareaandchooseZooKeeper.

SaveChanges.

NowselecttheInstancestab.

ThehdfsRoleInstancespageappears.

Observethatthehdfsservicecomprisesthefollowingroleinstances:

Page 95: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

95

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

95

• Balanceronhorse• DataNodesonelephant,monkey,horse,andtiger• TheactiveNameNodeonelephant• TheSecondaryNameNodeontiger• AnHttpFSserveronmonkey

Thelistofroleinstanceswillchangeafteryouenablehighavailability.

3. Click“EnableHighAvailability”.

4. The“GettingStarted”pageappears.

ChangetheNameserviceNametomycluster.

ClickContinue.

5. The“AssignRoles”pageappears.

Specifythefollowing:

• NameNodeHosts• elephant(Current) • tiger

• JournalNodeHosts

• elephant, horse, tiger

ClickContinue.

Page 96: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

96

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

96

6. The“ReviewChanges”pageappears.

Specifythevalue/dfs/jninallthreeJournalNodeEditsDirectoryfields.

ClickContinue.

The“Progress”pageappears.ThemessagesshowninthescreenshotbelowappearasClouderaManagerenablesHDFShighavailability.

Note:FormattingthenamedirectoriesofthecurrentNameNodewillfail.AsdescribedintheClouderaManagerinterface,thisisexpected.

Page 97: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

97

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

97

Whentheprocessofenablinghighavailabilityhasfinished,clickContinue.

Page 98: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

98

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

98

Aninformationalmessageappearsinformingyouofpost-setupstepsregardingtheHiveMetastore.Youwillnotperformthepost-setupstepsbecauseyouwillnotbeusingHiveforanyremainingexercises.

ClickFinish.

7. ThehdfsRoleInstancespageappears.

Observethatthehdfsservicenowcomprisesthefollowingroleinstances:• Balanceronhorse• DataNodesonelephant,tiger,horse,andmonkey• FailoverControllersonelephantandtiger• AnHttpFSserveronmonkey• JournalNodesonelephant,tiger,andhorse• TheactiveNameNodeonelephant• ThestandbyNameNodeontiger• NoSecondaryNameNode

Verifying Automatic NameNode Failover

Inthistask,youwillrestarttheactiveNameNode,bringingitdownandthenup.ThestandbyNameNodewilltransitiontotheactivestatewhentheactiveNameNodegoesdown,andtheformerlyactiveNameNodewilltransitiontothestandbystate.

ThenyouwillrestartthenewactiveNameNodeinordertorestoretheoriginalstatesofthetwoNameNodes.

1. NavigatetotheHDFSservice’sInstancestab.

2. Inthe“FederationandHighAvailability”sectionofthepage,observethatthestateofoneoftheNameNodesisactiveandthestateoftheotherNameNodeisstandby.

3. Scrolldowntothe“RoleInstances”section.

4. IntheRoleInstancessectionofthepage,selectthecheckboxtotheleftoftheentryfortheactiveNameNode.

Page 99: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

99

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

99

5. ClickActionsforSelected>Restart.

ClickRestarttoconfirmthatyouwanttorestarttheinstance.

6. Waitfortherestartoperationtocomplete.WhenithassuccessfullycompletedclickClose.

VerifythatthestatesoftheNameNodeshavechanged—theNameNodethatwasoriginallyactive(elephant)isnowthestandby,andtheNameNodethatwasoriginallythestandby(tiger)isnowactive.IftheClouderaManagerUIdoesnotimmediatelyreflectthischange,giveitafewsecondsanditwill.

Page 100: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

100

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

100

7. GotoDiagnostics>EventsandnoticethemanyrecentevententriesrelatedtotherestartingoftheNameNode.

8. BackintheHDFSInstancestab,restarttheNameNodethatiscurrentlytheactiveNameNode.

Aftertherestarthascompleted,verifythatthestatesoftheNameNodeshaveagainchanged.

This is the end of the Exercise.

Page 101: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

101

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

101

Hands-On Exercise: Using the Fair Scheduler Inthisexercise,youwillsubmitsomejobstotheclusterandobservethebehavioroftheFairScheduler.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

1. AdjustYARNmemorysettingandrestartthecluster.

InordertodemonstratetheFairScheduler,youwillneedtoincreasetheamountofmemorythatYARNcontainersarepermittedtouseonyourcluster.

InClouderaManager,gototheYARN(MR2Included)Configurationpage.

Inthesearchbox,searchforyarn.nodemanager.resource.memory-mb

Changethevalueofthisparameterto3GBinbothoftheRoleGroupswhereitappears.

Page 102: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

102

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

102

SavetheChanges.

NavigatetotheYARNStatuspageandclickonthe“StaleConfiguration:RestartNeeded”icon.

Followthestepstorestartthecluster.Whentherestarthascompleted,clickFinish.

2. Analyzethescriptyouwillruninthisexercise.

TomakeiteasiertostartandstopMapReducejobsduringthisexercise,ascripthasbeenprovided.Viewthescripttogainanunderstandingofwhatitdoes.

On elephant

Page 103: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

103

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

103

$ cd ~/training_materials/admin/scripts

$ cat pools.sh

Thescriptwillstartorstopajobinthepoolyouspecify.Ittakestwoparameters.Thefirstisthepoolnameandthesecondistheactiontotake(startorstop).Eachjobitwillrunwillberelativelylongrunningandconsistof10mappersand10reducers.

Note:Remember,weusethetermspoolandqueueinterchangeably.

3. StartthreeHadoopjobs,eachinadifferentpool.

Onelephant:

$ ./pools.sh pool1 start

$ ./pools.sh pool2 start

$ ./pools.sh pool3 start

Itisrecommendedthatyouattempttogothroughthisexercisebyonlystartingjobswhenpromptedintheinstructions.However,dependingonhowquicklyyoucompletethesteps,ajobmayhavecompletedearlierthantheinstructionsanticipated.Therefore,pleasenotethatatanytimeduringthisexerciseyoucanstartadditionaljobsinanypoolusinganyofthethreecommandsyouraninthisstep.

4. Verifythejobsstarted.

InClouderaManager,browsetoClusters>YARNApplications.

Youshouldnoticethatthethreejobshavestarted.Ifthejobsdonotyetdisplayinthepage,waitamomentandthenrefreshthepage.

Note:ClouderaManagerdoesrefreshpagesautomatically,howeverinthisexerciseyoumayfinditusefultorefreshthepagesmanuallytomorequicklyobservethelateststatusofpoolsandjobsrunninginthepools.

Leavethisbrowsertabopen.

Page 104: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

104

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

104

5. Observethestatusofthepools.

Openanotherbrowsertab,andinClouderaManager,browsetoClusters>DynamicResourcePools.

Analyzethedetailsinthe“ResourcePoolsUsage”table.

Ifpool1,pool2,andpool3donotyetdisplay,refreshthebrowsertab.

Thetabledisplaysthepoolsinthecluster.Thepoolsyousubmittedjobstoshouldhavependingcontainersandallocatedcontainers.

Thetablealsoshowstheamountofmemoryandvcoresthathavebeenallocatedtoeachpool.

6. AnalyzethePerPoolcharts.

OnthesamepageinClouderaManager,noticethePerPoolSharescharts.ThereisonechartforFairShareMemoryandanotherforFairShareVCores.

Ifnothingisdisplayinginthesechartsyet,waitamomentandtheywill.Optionallyrefreshthebrowserpage.

Leavethisbrowsertabopen.

7. Startanotherjobinanotherpool.

Inthesameshellsessiononelephant:

$ ./pools.sh pool4 start

8. BackinClouderaManager,observetheresourceallocationaffectofstartinganewjobinanewpool.

OccasionallyrefreshtheDynamicResourcePoolspage.

Somepoolsmaybeinitiallyovertheirfairsharebecausethefirstjobstorunwilltakeallavailableclusterresources.

However,overtime,noticethatthejobsrunningoverfairsharebegintoshedresources,whicharereallocatedtootherpoolstoapproachfairshareallocationforallpools.

Page 105: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

105

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

105

Tip:Mouseoveranyoneofthe“PerPool..”chartsandthenclickthedouble-arrowicontoexpandthechartsize.

9. Conductfurtherexperiments.

Stopthejobrunninginpool1.

$ ./pools.sh pool1 stop

Waitaminuteortwo,thenobservetheresultsinthechartsontheDynamicResourcePoolspage.

Startasecondjobinpool3.

$ ./pools.sh pool3 start

Againobservetheresults.

10. ConfigureaDynamicResourcePoolforpool2.

IntheDynamicResourcePoolspage,clickintotheConfigurationtabandclickontheResourcePoolstab.

Clickon“AddResourcePool”.

IntheGeneraltab,settheResourcePoolNametopool2.KeepDRFastheschedulingpolicy.

IntheYARNtab,configurethefollowingsettings:• Weight:2• VirtualCoresMin:1• VirtualCoresMax:2• MemoryMin:2400• MemoryMax:5000

ClickOKtosavethechanges.

Page 106: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

106

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

106

11. ObservetheeffectofthenewDynamicResourcePoolontheresourceallocationswithinthecluster.

IntheYARNapplicationspage,checkhowmanyjobsarestillrunningandinwhichpoolstheyarerunning.

Usethepools.shscripttostartorstopjobssothatthereisonejobrunningineachofthefourpools.

ReturntotheDynamicResourcePoolspage’sStatustabtoobservetheaffectofthepoolsettingsyoudefined.

AsyoucontinuetoobservethePer-PoolShareschart,youshouldsoonseethatpool2isgivenagreatershareofresources.

12. Cleanup.

WhenyouaredoneobservingthebehavioroftheFairScheduler,stopallrunningjobseitherbyusingthepools.shscriptorkilltheapplicationsfromtheYARNApplicationspageinClouderaManager.

This is the end of the Exercise.

Page 107: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

107

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

107

Hands-On Exercise: Breaking The Cluster Inthisexercise,youwillseewhathappensduringfailuresofportionsoftheHadoopcluster.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

1. VerifytheexistenceofalargefileinHDFS.

InapreviousexerciseyouplacedtheweblogfilesinHDFS.Verifythefilesarethere.

Onelephant:

$ hdfs dfs -ls weblog

Onlyifyoudonotseetheaccess_logfileinHDFS,placeittherenow.

Onelephant:

$ cd ~/training_materials/admin/data

$ hdfs dfs -mkdir weblog

$ gunzip -c access_log.gz \

| hdfs dfs -put - weblog/access_log

2. Locateablockthathasbeenreplicatedonelephantasfollows:

IntheNameNodeWebUI,navigatetothe/user/training/weblog/access_logfile.TheFileInformationwindowappears.

Page 108: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

108

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

108

LocatetheAvailabilitysectionintheFileInformationwindowforBlock0.YoushouldseethreehostsonwhichBlock0isavailable.Ifoneofthereplicasisonelephant,noteitsBlockID.YouwillneedtorefertotheBlockIDinthenextexercise.

IfnoneofBlock0’sreplicasareonelephant,viewthereplicationinformationforotherblocksinthefileuntilyoulocateablockthathasbeenreplicatedonelephant.Onceyouhavelocatedablockthathasbeenreplicatedonelephant,noteitsblockID.

WewillrevisitthisblockwhentheNameNoderecognizesthatoneoftheDataNodesisa‘deadnode’(after10minutes).

3. Now,intentionallycauseafailureandobservewhathappens.

UsingClouderaManager,fromtheHDFSInstancespage,stoptheDataNoderunningonelephant.

4. VisittheNameNodeWebUIagainandclickon‘Datanodes’.Refreshthebrowserseveraltimesandnoticethatthe‘Lastcontact’valuefortheelephantDataNodekeepsincreasing.

5. RuntheHDFSfilesystemconsistencychecktoseethattheNameNodecurrentlythinkstherearenoproblems.

Onelephant:

$ sudo -u hdfs hdfs fsck /

6. Waitforatleasttenminutestopassbeforestartingthenextexercise.

(optional)inanyterminal:

$ sleep 600 && echo “10 minutes have passed.”

This is the end of the Exercise.

Page 109: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

109

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

109

Hands-On Exercise: Verifying The Cluster’s Self-Healing Features Inthisexercise,youwillseewhathashappenedtothedataonthedeadDataNode.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

1. IntheNameNodeWebUI,clickonDatanodesandconfirmthatyounowhaveone‘deadnode.’

2. Viewthelocationoftheblockfromtheaccess_logfileyouinvestigatedinthepreviousexercise.NoticethatHadoophasautomaticallyre-replicatedthedatatoanotherhosttoretainthree-foldreplication.

3. InClouderaManager,chooseClusters>Reports.

Clickon“Customreport”atthebottomoftheDiskUsage(HDFS)section.

Buildareportwiththefollowingtwofilters(usetheplussigntoaddthesecondfilter):• Replication < 4 • Group equal to hadoop

Click“GenerateReport”.Youshouldseeallfilesmatchthecriteriasincethereplicationfactorforyourclusterissettothree.

ChangetheReplicationfilterfrom< 4to< 3andrungeneratethereportagain.

NowthereshouldbenofilesthatmatchthecriteriasincedatastoredonelephantwasreplicatedovertooneoftheotherthreeDataNodes.

4. Viewchartsthatshowwhenreplicationoccurred.

FromtheClouderaManagerHDFSpage,chooseCharts>ChartsLibrary.

Page 110: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

110

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

110

Inthesearchbox,typeinreplication.Thiswillshowyouthe“PendingReplicationBlocks”and“ScheduledReplicationBlocks”charts.

NotethespikeinactivitythatoccurredaftertheDataNodewentdown.

5. ViewtheauditandlogtrailsinClouderaManager.

InClouderaManager,clickonAudits.

NotethetimestampforwhentheHDFSservicewasstopped.

ChooseDiagnostics>Logs,selectsourcesHDFSonly,settheMinimumLogLevelto“INFO”,andenterthesearchterm“replicate”.

ClickSearch.

Scrolltothebottom.Noticethelogmessagesrelatedblocksbeingreplicated.

6. Runthehdfs fsckcommandagaintoobservethatthefilesystemisstillhealthy.

Onelephant:

$ sudo -u hdfs hdfs fsck /

7. Runthehdfs dfsadmin -reportcommandtoseethatonedeadDataNodeisnowreported.

Onelephant:

Page 111: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

111

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

111

$ sudo -u hdfs hdfs dfsadmin -report

8. UseClouderaManagertorestarttheDataNodeonelephant,bringingyourclusterbacktofullstrength.

9. Runthehdfs fsckcommandagaintoobservethetemporaryoverreplicationofblocks.

Onelephant:

$ sudo -u hdfs hdfs fsck /

Notethattheoverreplicationsituationwillresolveitself(ifithasnotalready)nowthatthepreviouslyunavailableDataNodeisonceagainrunning.

Ifthecommandabovedidnotshowanyoverreplicatedblocks,gotoDiagnostics>LogsinClouderaManagerandsearchtheHDFSsourcefor“ExcessRepl”.Youshouldfindevidenceofthetemporaryover-replicationinthelogentries.

This is the end of the Exercise.

Page 112: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

112

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

112

Hands-On Exercise: Taking HDFS Snapshots Inthisexercise,youwillenableHDFSsnapshotsonadirectoryandthepracticerestoringdatafromasnapshot.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

1. EnablesnapshotsonadirectoryinHDFS.

InClouderaManager,gototheHDFSpageforyourclusterandclickFileBrowser.

Browseto/user/training,thenclickEnableSnapshots.

Inthe“EnableSnapshots”windows,keeptheSnapshottablePathsetto/user/trainingandclickEnableSnapshots.

Thecommandcompletes.Noticeinthemessagedisplayedonthe“Program:”linethatsnapshotscanalsobeenabledfromthecommandlineusingthehdfs dfsadmintool.

ClickClose.Noticethatthereisnowa“TakeSnapshot”button.

2. Takeasnapshot.

StillintheClouderaManagerFileBrowserat/user/training,Click“TakeSnapshot”.Giveitthenamesnap1andclickOK.

AfterthesnapshotcompletesclickClose.

Thesnapshotsectionshouldnowshowyour“snap1”listing.

3. Deletedatafrom/user/trainingthenrestoredatafromthesnapshot.

Page 113: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

113

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

113

Nowlet’sseewhathappensifwedeletesomedata.

Onelephant:

$ hdfs dfs -rm -r weblog

$ hdfs dfs -ls /user/training

Thesecondcommandshouldshowthattheweblogdirectoryisnowgone.

Howeveryouweblogdataisstillavailable,whichyoucanseebyrunningthecommandshere:

$ hdfs dfs -ls /user/training/.snapshot/snap1

$ hdfs dfs -tail .snapshot/snap1/weblog/access_log

Restoreacopyoftheweblogdirectorytotheoriginallocationandthenverifyitisbackinplace.

$ hdfs dfs -cp .snapshot/snap1/weblog weblog

$ hdfs dfs -ls /user/training

This is the end of the Exercise.

Page 114: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

114

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

114

Hands-On Exercise: Configuring Email Alerts Inthisexercise,youwillconfigureClouderaManagertouseanemailservertosendalerts.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

1. ConfigureClouderaManagertosendemailalertsusingtheemailserveronlion.

InClouderaManager,chooseClusters>ClouderaManagementService.

ClickonConfigurationandthenchoose“AlertPublisherDefaultGroup”.

Confirmthe“Alerts:EnableEmailAlerts”propertyischecked.

Configurethefollowing:• Alerts:MailServerUsername:training• Alerts:MailServerPassword:training• Alerts:MailMessageRecipients:training@localhost• Alerts:MailMessageFormat:text

Savethechanges.

2. RestarttheClouderaManagementService.

3. SendatestalertfromClouderaManager.

InClouderaManager,gotoAdministration>Alerts.Youshouldseethattherecipient(s)ofalertsisnowsettotraining@localhost.

Clickonthe“SendTestAlert”buttonatthetopofthepage.

4. ConfirmemailsarebeingreceivedfromClouderaManager.

Page 115: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

115

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

115

Thepostfixemailserverisrunningonlion.Hereyouusethemailcommandlineclienttoaccessthetraininguser’sinbox.

Onlion:

$ mail

The“TestAlert”emailshouldshowasunread(U).

Atthe&prompt,typeinthenumberthatappearstotherightoftheUandhitthe<Enter>keysoyoucanreadtheemail.

Afteryouaredonereadingtheemail,typeq<Enter>toexitthemailclient.

This is the end of the Exercise.

Page 116: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

116

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

116

Troubleshooting Challenge: Heap O’ Trouble It’s8:30AMandyouareenjoyingyourfirstcupofcoffee.Keri,whoismakingthetransitionfromwritingRDBMSstoredprocedurestocodingJavaMapReduce,showsupinyourdoorwaybeforeyou’reevenhalfwaythroughthatfirstcup.

“IjusttriedtorunaMapReducejobandIgotanoutofmemoryexception.Iheardthattherewas32GBonthosenewmachinesyoubought.ButwhenIrunthisstupidjob,Ikeepgettingoutofmemoryerrors.Isn’t32GBenoughmemory?IfIdon’tfixthisthing,I’mgoingtobeinaheapoftrouble.ItoldmymanagerIwas99%completewithmyprojectbutnowI’mnotevensureifIcandowhatIwanttodowithHadoop.”

PutdownyourcoffeeandseeifyoucanhelpKerigetherjobrunning.

IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:

$ ~/training_materials/admin/scripts/reset_cluster.sh

Recreating the Problem

1. ConfirmfilesinHDFS.

Onelephant:

$ hdfs dfs -ls /tmp/shakespeare.txt

Thecommandaboveshouldshowthatshakespeare.txtisinHDFS.

Onlyifshakespeare.txtwasnotfound,runthesecommandstoplacethefileinHDFS.

Onelephant:

Page 117: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

117

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

117

$ cd ~/training_materials/admin/data

$ gunzip shakespeare.txt.gz

$ hdfs dfs -put shakespeare.txt /tmp

Onelephant:

$ hdfs dfs -ls weblog/access_log

Thecommandaboveshouldconfirmthattheaccess_logfileexists.

Onlyifaccess_logwasnotfound,runthesecommandstoplacethefileinHDFS.

Onelephant:

$ cd ~/training_materials/admin/data

$ hdfs dfs –mkdir weblog

$ gunzip -c access_log.gz \

| hdfs dfs -put – weblog/access_log

2. RuntheHeapofTroubleprogram.

onelephant:

$ cd ~/training_materials/admin/java

$ hadoop jar EvilJobs.jar HeapOfTrouble \

/tmp/shakespeare.txt heapOfTrouble

Page 118: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

118

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

118

Attacking the Problem

TheprimarygoalofthisandalltheothertroubleshootingexercisesistostarttobecomemorecomfortableanalyzingproblemscenariosbyusingHadoop’slogfilesandWebUIs.Althoughyoumightbeabletodeterminethesourceoftheproblemandfixit,doingsosuccessfullyisnottheprimarygoalhere.

Takeasmanyactionsasyoucanthinkoftotroubleshootthisproblem.Pleasewritedowntheactionsthatyoutakewhileperformingthischallengesothatyoucansharethemwithothermembersoftheclasswhenyoudiscussthisexerciselater.

Fixtheproblemifyouareableto.

Donotturntothenextpageunlessyouarereadyforsomehints.

Page 119: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

119

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

119

Some Questions to Ask While Troubleshooting a Problem

ThislistofquestionsprovidessomestepsthatyoucouldfollowwhiletroubleshootingaHadoopproblem.AllofthestepsdonotnecessarilyapplytoallHadoopissues,butthislistisagoodplacetostart.

• Whatistherethatisdifferentintheenvironmentthatwasnottherebefore

theproblemstartedoccurring?• Isthereapatterntothefailure?Isitrepeatable?• Ifaspecificjobseemstobethecauseoftheproblem,locatethetasklogsfor

thejob,includingtheApplicationMasterlogs,andreviewthem.Doesanythingstandout?

• ArethereanyunexpectedmessagesintheNameNode,ResourceManager,andNodeManagerlogs?

• Howisthehealthofyourcluster?• Isthereadequatediskspace?• Morespecifically,doesthe/var/logdirectoryhaveadequatedisk

space?• Mightthisbeaswappingissue?• Isnetworkutilizationextraordinarilyhigh?• IsCPUutilizationextraordinarilyhigh?

• Canyoucorrelatethiseventwithanyoftheissues?• IfitseemslikeaHadoopMapReducejobisthecauseoftheproblem,isit

possibletogetthesourcecodeforthejob?• DoessearchingtheWebfortheerrorprovideanyusefulhints?

Fixing the Problem

Ifyouhavetimeandareableto,fixtheproblemsothatKericanrunherjob.

Post-Exercise Discussion

Aftersometimehaspassed,yourinstructorwillaskyoutostoptroubleshootingandwillleadtheclassinadiscussionoftroubleshootingtechniques.

This is the end of the Exercise.

Page 120: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

120

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

120

Appendix A: Setting up VMware Fusion on a Mac for the Cloud Training Environment WhenperformingtheHands-OnExercisesforthiscourse,youuseasmallCentOSvirtualmachinecalledGet2EC2.ThisVMisconfiguredtouseNATnetworking.YouconnecttoAmazonEC2instancesfromtheguestOSbystartingSSHsessions.TheGet2EC2VMissupportedforVMwareorVirtualBox.

VMwareFusion,likeotherhypervisors,runsaninternalDHCPserverforNAT-tedgueststhatassignsIPaddressestotheguests.Fromtimetotime,theinternalDHCPserverreleasesandrenewstheguests’leases.Unfortunately,theinternalDHCPserverinVMwareFusiondoesnotalwaysassignthesameIPaddresstoaguestthatithadpriortothereleaseandrenew,andtheGet2EC2VM’sIPaddresschanges.

ChangingtheIPaddressresultsinproblemsforactiveSSHsessions.Sometimestheterminalwindowinwhichtheclientisrunningwillfreezeup,becomingunresponsivetomouseandkeyboardinput,andnolongerdisplayingstandardoutput.Atothertimes,sessionswillbeshutdownwithaBrokenPipeerror.Ifthishappens,youwillhavetore-openanyfailedsessions.

IfyouareusingVMwareFusiononaMactoperformtheHands-OnExercisesforthiscourse,youneedtodecidewhetheryouwouldprefertotakeactionordonothing:

• IfyouhaveadministratorprivilegesonyourMac,youcanconfigureVMwareFusiontouseafixedIPaddress.TheinstructionsforconfiguringVMwareFusiontouseafixedIPaddressappearbelow.

• IfyouhaveVirtualBoxinstalled,youcanusetheVirtualBoxGet2EC2VMinsteadofVMwareFusion.

• Youcandonothing,inwhichcaseyoumightencounterterminalfreezesasdescribedabove.

ToconfigureVMwareFusiontouseafixedIPaddress,performthefollowingsteps:

Page 121: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

121

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

121

1. StartVMwareFusion.

2. CreateanentryintheVirtualMachineslistfortheGet2EC2VM.Tocreatetheentry,dragtheCloudera-Training-Get2EC2-VM-1.0.vmxfiletotheVirtualMachineslist.

YoushouldseetheCloudera-Training-Get2EC2-VM-1.0entryintheVirtualMachineslist.WewillrefertotheCloudera-Training-Get2EC2-VM-1.0VMastheGet2EC2VM.

3. MakesuretheGet2EC2VMispowereddown.

4. ClickonceontheGet2EC2VMentryintheVirtualMachineslisttoselecttheVM.

Note:Ifyouaccidentallydouble-clicktheentry,youstarttheVM.Beforeyouproceedtothenextstep,powerdowntheVM.

5. ClicktheSettingsiconintheVMwareFusionToolbar(orselectVirtualMachines>Settings).

6. ClickNetworkAdapter.

7. ClickAdvancedOptions.

TheMACAddressfieldappears.

8. IftheMACAddressfieldisempty,clickGeneratetogenerateaMACaddressfortheGet2EC2VM.

9. CopytheMACaddressandpasteitintoafilewhereyoucanaccessitlater.YouwillneedtousetheMACaddressinasubsequentstep.

Page 122: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

122

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

122

10. OpenthefollowingfileonyourMacusingsuperuser(sudo)privileges:

• VMwareFusion4andhigher:/Library/Preferences/VMware Fusion/vmnet8/dhcpd.conf

• VMwareFusion3:/Library/Application Support/VMware Fusion/vmnet8/dhcpd.conf

Lookfortherangestatement.ItshouldhavearangeofIPaddresses.Forexample:

range 172.16.73.128 172.16.73.254;

11. ChooseanIPaddressfortheGet2EC2VM.TheIPaddressshouldhavethefirstthreetuplesoftheIPaddressesintherangestatement,butthefourthtupleshouldbeoutsideoftheaddressesintherangestatement.Giventheexampleoftherangestatementinthepreviousstep,youwouldchooseanIPaddressthatstartswith172.16.73andendswithanumberlowerthan128(butnot0,1,or2–thosenumbersarereservedforotherpurposes).

Forexample,172.16.73.10.

12. Addfourlinestothebottomofthedhcpd.conffileasfollows:

host Get2EC2 {

hardware ethernet <MAC_Addresss>;

fixed-address <IP_Address>;

}

Replace<MAC_Address>withtheMACaddressyougeneratedinanearlierstep.

Replace<IP_Address>withtheIPaddressyouchoseinthepreviousstep.

BesuretoincludethesemicolonsaftertheMACandIPaddressesasshownintheexample.

13. Saveandclosethedhcpd.conffile.

Page 123: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

123

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

123

14. RunthefollowingcommandsfromaterminalwindowonyourMac:

ForVMwareFusion4orhigher:

$ sudo /Applications/VMware\ Fusion.app/Contents/\

Library/vmnet-cli --stop

$ sudo /Applications/VMware\ Fusion.app/Contents/\

Library/vmnet-cli --start

ForVMwareFusion3:

$ sudo /Library/Application Support/VMware\ Fusion/\

boot.sh --restart

15. StarttheGet2EC2VM.

16. AftertheVMhascomeup,runthefollowingcommandinaLinuxterminalwindow:

$ ip addr

VerifythattheIPaddressthatappearsistheIPaddressthatyouspecifiedinthedhcpd.conffile.

This is the end of this Appendix.

Page 124: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

124

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

124

Appendix B: Setting up VirtualBox for the Cloud Training Environment FollowthesestepstosetupVirtualBoxforthecloudtrainingenvironmentifyoudonotwanttoinstallVMwareFusiononyourMac.

VMwareFusionisourpreferredhypervisorforstudentsrunningthiscourseonMacOS.PleaseuseVMwareFusionifpossible.UseVirtualBoxforthiscourseonlyifitisyourpreferredvirtualizationenvironmentandifyouareknowledgeableenoughtobeself-sufficienttotroubleshootproblemsyoumightruninto.

Thissetupactivitycomprises:

• CreatingtheGetEC2VM

• PoweringuptheVM

• InstallingVirtualBoxGuestAdditionsontheVM

ThissetuprequiresVirtualBoxversion4orhigher.

1. Getthe.vmdkfilefortheclassfromyourinstructorandcopyitontothesystemonwhichyouwillbedoingtheHands-OnExercises.

2. StartVirtualBox.

3. SelectMachine>New.

TheNameandOperatingSystemdialogboxappears.

4. IntheNameandOperatingSystemdialogbox,specifyGet2EC2astheName,LinuxastheType,andRed Hat(notRedHat64-bit)astheVersion.ClickContinue.

TheMemorySizedialogboxappears.

5. IntheMemorySizedialogbox,acceptthedefaultmemorysizeof512MBandclickContinue.

Page 125: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

125

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

125

TheHardDrivedialogboxappears.

6. IntheHardDrivedialogbox,select“Useanexistingvirtualharddrivefile.”

7. Inthefieldunderthe“Useanexistingvirtualharddrivefile”selection,navigatetothe.vmdkfilefortheclassandclickOpen.

8. ClickCreate.

TheOracleVMVirtualBoxManagerdialogboxappears.TheGet2EC2VMappearsontheleftsideofthedialogbox,withthestatusPoweredOff.

9. ClickStarttostarttheGet2EC2VM.

TheVMstartsup.Afterstartupiscomplete,theGNOMEinterfaceappears.Youareautomaticallyloggedinasthetraininguser.

NowyouarereadytoinstallVirtualBoxGuestAdditionsontheVM.

Note:TheversionofVirtualBoxandGuestAdditionsmustbethesame.YoumustinstallGuestAdditionsnowtoguaranteecompatibilitybetweenyourversionofVirtualBoxandGuestAdditions.

10. SelectDevices>InstallGuestAdditions.

Afterseveralseconds,adialogboxappearspromptingyoutoselecthowyouwanttoinstalltheversionofVBOXADDITIONSonyoursystem.VerifythatOpenAutorunPromptisselectedastheAction,thenclickOK.

11. AnotherdialogboxappearspromptingyoutoconfirmyouwanttoruntheGuestAdditionsinstaller.ClickRun.

12. TheAuthenticatedialogboxappears,promptingyoutoentertherootuser’spassword.SpecifytrainingandclickAuthenticate.

13. MessagesappearintheterminalwindowwhileVirtualBoxisbuildingandinstallingtheGuestAdditions.

Wheninstallationiscomplete,themessage,“PressReturntoclosethiswindow”appearsintheterminalwindow.

Page 126: Cloudera Administrator Training for Apache Hadoop: Hands ... · 5 Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from

126

Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.

126

14. PressReturn.

15. SelectSystem>“LogOuttraining”tologoutofyourGNOMEsession.

Afteryouhaveloggedout,youareautomaticallyloggedbackinasthetraininguser.

YouhavecompletedtheVirtualBoxsetup.Pleasereturntothenextstepin“ConfiguringNetworkingonYourCluster:CloudTrainingEnvironment”andcontinuethesetupactivityforthecloudtrainingenvironment.

This is the end of this Appendix.


Recommended