Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
Cloudera Administrator Training for Apache Hadoop: Hands-On Exercises
GeneralNotes............................................................................................................................3SetupActivity:ConfiguringNetworking...........................................................................7
Hands-OnExercise:InstallingClouderaManagerServer.......................................13Hands-onExercise:CreatingaHadoopCluster..........................................................20
Hands-OnExercise:WorkingWithHDFS......................................................................33
Hands-OnExercise:RunningYARNApplications.......................................................38Hands-OnExercise:ExploreHadoopConfigurationsandDaemonLogs...........49
Hands-OnExercise:UsingFlumetoPutDataintoHDFS..........................................55
Hands-OnExercise:ImportingDatawithSqoop........................................................63Hands-OnExercise:QueryingHDFSWithHiveandClouderaImpala.................68
Hands-OnExercise:UsingHuetoControlHadoopUserAccess............................78Hands-OnExercise:ConfiguringHDFSHighAvailability.........................................91
Hands-OnExercise:UsingtheFairScheduler..........................................................101
Hands-OnExercise:BreakingTheCluster.................................................................107Hands-OnExercise:VerifyingTheCluster’sSelf-HealingFeatures..................109
Hands-OnExercise:TakingHDFSSnapshots............................................................112Hands-OnExercise:ConfiguringEmailAlerts..........................................................114
TroubleshootingChallenge:HeapO’Trouble..........................................................116
201510
2
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
2
AppendixA:SettingupVMwareFusiononaMacfortheCloudTrainingEnvironment........................................................................................................................120AppendixB:SettingupVirtualBoxfortheCloudTrainingEnvironment.......124
3
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
3
General Notes
Training Environment Overview
Inthistrainingcourse,youwilluseacloudenvironmentconsistingoffiveAmazonEC2instances.Youwillalsousealocalvirtualmachine(VM)toaccessthecloudenvironment.
AlloftheEC2instancesusetheCentOS6.4Linuxdistribution.
Usethetraininguseraccounttodoyourwork.Youshouldnotneedtoenterpasswordsforthetraininguser.
Shouldyouneedsuperuseraccess,youcanusesudoasthetraininguserwithoutenteringapassword.Thetraininguserhasunlimited,passwordlesssudoprivileges.
For the training environment:
• YouwillstarttheVM,andthenuseittoconnecttoyourEC2instances
• TheEC2 instances havebeen configured so that you can connect to themwithoutenteringapassword
• YourinstructorwillprovidethefollowingdetailsforfiveEC2instancesperstudent:
o EC2 public IP addresses – Youwill run a script that adds the EC2public IPaddressesofyour fiveEC2 instances to the/etc/hostsfileonyourVM
o EC2private IP addresses – Youwill use these addresseswhen yourun a script that configures the /etc/hosts file on your EC2instances.EC2private IPaddressesstartwith thenumber10,172,or192.168
4
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
4
o TheinstructorwillnotprovideEC2internalhostnames.Pleaseusethe host nameselephant,tiger,horse,monkey andlion forthefiveinternalhostnames.Itisimportantthatyouusethesefivehost names, because several scripts, which expect these five hostnames,havebeenprovidedforyourusewhileyouperformexercises.Thescriptswillnotworkifyouusedifferenthostnames.
• PleasewriteoutatablesimilartothefollowingwithyourEC2instanceinformation:
HostName EC2PublicIPAddress EC2PrivateIPAddress
elephant
tiger
horse
monkey
lion
Notational Convention
Insomecommand-linestepsintheexercises,youwillseelineslikethis:
$ hdfs dfs -put shakespeare \
/user/training/shakespeare
Thebackslashattheendofthefirstlinesignifiesthatthecommandisnotcompleted,andcontinuesonthenextline.Youcanenterthecodeexactlyasshown(ontwolines),oryoucanenteritonasingleline.Ifyoudothelatter,youshouldnottypeinthebackslash.
5
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
5
Copying and Pasting from the Hands-On Exercises Manual
Ifyouwish,youcanusuallycopycommandsandstringsfromthisHands-OnExercisesmanualandpastethemintoyourterminalsessions.However,pleasenoteoneimportantexception:
Ifyoucopyandpastetableentriesthatexceedasingleline,aspacemaybeinsertedattheendofeachline.
Dash(-)charactersareespeciallyproblematicanddonotalwayscopycorrectly.Besuretodeleteanypasteddashcharactersandkeytheminmanuallyafterpastingintextthatcontainsthem.
Pleaseusecautionwhencopyingandpastingwhenperformingtheexercises.
Resetting Your Cluster
Youcanusethereset_cluster.shscripttochangethestateofyourclustersothatyoucanstartwithafresh,correctenvironmentforperforminganyexercise.Usethescriptinsituationssuchasthefollowing:
• Whileattemptingoneoftheexercises,youmisconfigureyourmachinessobadlythatattemptingtodothenextexerciseisnolongerpossible.
• Youhavesuccessfullycompletedanumberofexercises,butthenyoureceiveanemergencycallfromworkandyouhavetomisssometimeinclass.Whenyoureturn,yourealizethatyouhavemissedtwoorthreeexercises.Butyouwanttodothesameexerciseeveryoneelseisdoingsothatyoucanbenefitfromthepost-exercisediscussions.
Thescriptisdestructive:anyworkthatyouhavedoneisoverwrittenwhenthescriptruns.Ifyouwanttosavefilesbeforerunningthescript,youcancopythefilesyouwanttosavetoasubdirectoryunder/home/training.
Beforeyouattempttorunthescript,verifythatnetworkingamongthefivehostsinyourclusterisworking.Ifnetworkinghasnotbeenconfiguredcorrectly,youcan
6
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
6
reruntheCM_config_hosts.shscripttoresetthenetworkingconfigurationpriortorunningthereset_cluster.shscript.
Runthescriptonelephantonly.Youdonotneedtochangetoadirectorytorunthescript;itisinyourshell’sPATH.
Thescriptstartsbypromptingyoutoenterthenumberofanexercise.Specifytheexerciseyouwanttoperformafterthescripthasrun.Thenconfirmthatyouwanttoresetthecluster(thusoverwritinganyworkyouhavedone).
Thescriptwillfurtherpromptyoutospecifyifyouwanttorunonlythestepsthatsimulatethepreviousexercise,orifyouwanttocompletelyuninstallandreinstalltheclusterandthencatchyouuptothespecifiedexercise.Notethatchoosingtoonlycompletethepreviousexercisedoesnotofferasstronganassuranceofproperlyconfiguringyourclusterasafullresetwoulddo.Itishoweveramoreexpedientoption.
Afteryouhaverespondedtotheinitialprompts,thescriptbeginsbycleaningupyourcluster—terminatingHadoopprocesses,removingHadoopsoftware,deletingHadoop-relatedfiles,andrevertingotherchangesyoumighthavemadetothehostsinyourcluster.Pleasenotethatasthissystemcleanupphaseisrunning,youwillseeerrorssuchas“unrecognizedservice”and“Nopackagesmarkedforremoval.”Theseerrorsareexpected.Theyoccurbecausethescriptattemptstoremoveanythingpertinentthatmightbeonyourcluster.Thenumberoferrormessagesthatappearduringthisphaseofscriptexecutiondependsonthestatetheclusterisinwhenyoustartthescript.
Next,thescriptsimulatesstepsforeachexerciseuptotheoneyouwanttoperform.
Scriptcompletiontimevariesfrom5minutestoalmostanhourdependingonhowmanyexercisestepsneedtobesimulatedbythescript.
7
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
7
Setup Activity: Configuring Networking Inthispreparatoryexerciseyouwillconfigurenetworkingforyourfiveinstances.
TaskOverview
Inthistask,youwillrunscriptstoconfigurenetworking.
First,youwillstartthelocalGet2EC2VMandrunascripttoconfigurethe/etc/hostsfileonthatVMwiththeaddressesofthefiveEC2instancesandthehostnameselephant,tiger,horse,monkeyandlion.
Next,youwillrunascripttoconfigurethe/etc/hostsfileontheEC2instances,settingthefivehostnamestoelephant,tiger,horse,monkeyandlion.
Thenyouwillverifythatnetworkinghasbeensetupcorrectly.
Finally,youwillrunascriptthatstartsaSOCKS5proxyserveronthelocalVM.
StepstoComplete
1. IfyouareusingVMwareFusiononMacOStoconnecttoyourfiveEC2instances,read‘AppendixA,SettingupVMwareFusiononaMacfortheCloudTrainingEnvironment’andperformanyindicatedactions.Afteryouhavedoneso,continuetothenextstep.
8
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
8
2. StarttheGet2EC2VM.
YoushouldfindtheVMonyourDesktop,ontheC:driveofyourmachine,orinthelocationtowhichyoudownloadedit.Double-clicktheiconwhosenameendsin.vmxtolaunchtheVM.
Afterafewminutes,theGNOMEinterfacewillappear.
Note: A VirtualBox VM is available for students running Mac OS who are
unable to or prefer not to install VMware Fusion. However, we strongly
recommend using the VMware version VM. Use VirtualBox for this course
only if it is your preferred virtualization environment and if you are
knowledgeable enough to be self-sufficient to troubleshoot problems you
might run into.
If you are using a VirtualBox VM, follow the steps in ‘Appendix B, Setting up
VirtualBox for the Cloud Training Environment.’ When you have completed
the steps in the Appendix, continue to the next step.
3. Runascriptthatmodifiesthe/etc/hostsfileonyourVMbyaddingyourfiveEC2instances.
EnterthefollowingcommandintheterminalwindowintheVM:
$ CM_config_local_hosts_file.sh
TheCM_config_local_hosts_file.shscriptwillpromptyoutoenterthefiveEC2publicIPaddressesforyourfiveEC2instances.RefertothetableyoucreatedwhenyourinstructorgaveyoutheEC2instanceinformation.
4. LogintotheelephantEC2instanceasthetraininguser.
$ connect_to_elephant.sh
Whenpromptedtoconfirmthatyouwanttocontinueconnecting,enteryesandthenpressEnter.
9
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
9
5. RuntheCM_config_hosts.shscriptonelephant.Thisscriptsetsupthe/etc/hostsfileonelephant,copiesthatfiletotiger,horse,monkeyandlion,andthensetsthehostnamesforthefivehoststoelephant,tiger,horse,monkeyandlion.
Enterthefollowingcommandonelephant:
$ CM_config_hosts.sh
WhenthescriptpromptsyoutoenterIPaddressesforyourEC2instances,entertheEC2privateIPaddresses(whichtypicallystartwith10,172,or 192.168).
Whenthescriptpromptsyoutoconfirmthatyouwanttocontinueconnecting,enteryeseachtimeandthenpressEnter.
6. TerminateandrestarttheSSHsessionwithelephant.
$ exit
$ connect_to_elephant.sh
Note:Youexitandreconnecttoelephanttoresetthevalueofthe$HOSTNAMEvariableintheshellaftertheCM_config_hosts.shscriptchangesthehostname.
7. StartSSHsessionswithyourotherfourEC2instances,logginginasthetraininguser.
Openfourmoreterminalwindows(ortabs)onyourVM,sothatatotaloffiveterminalwindowsareopen.
Inoneofthenewterminalwindows,connecttothetigerEC2instance.
$ connect_to_tiger.sh
Whenthescriptpromptsyoutoconfirmthatyouwanttocontinueconnecting,enteryesandthenpressEnter.(Youwillalsoneedtodothiswhenyouconnecttohorseandmonkey.)
10
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
10
Inanotherterminalwindow,connecttothehorseEC2instance.
$ connect_to_horse.sh
Inanotherterminalwindow,connecttothemonkeyEC2instance.
$ connect_to_monkey.sh
Intheremainingterminalwindow,connecttothelionEC2instance.
$ connect_to_lion.sh
8. Verifythatyoucancommunicatewithallthehostsinyourclusterfromelephantbyusingthehostnames.
Onelephant:
$ ping elephant
$ ping tiger
$ ping horse
$ ping monkey
$ ping lion
9. VerifythatpasswordlessSSHworksbyrunningtheip addrcommandontiger,horse,monkeyandlionfromasessiononelephant.
Onelephant:
$ ssh training@tiger ip addr
$ ssh training@horse ip addr
$ ssh training@monkey ip addr
$ ssh training@lion ip addr
Theip addrcommandshowsyouIPaddressesandproperties.
11
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
11
Note: Your environment is set up to allow you to use passwordless SSH to
submit commands from a session on elephant to the root and training
users on tiger, horse, and monkey, and to allow you to use the scp
command to copy files from elephant to the other four hosts. Passwordless
SSH and scp are not required to deploy a Hadoop cluster. We make these
tools available in the classroom environment as a convenience.
10. Runtheuname -ncommandtoverifythatallfivehostnameshavebeenchangedasexpected.
Onallfivehosts:
$ uname -n
11. Verifythatthevalueofthe$HOSTNAMEenvironmentvariablehasbeensettothenewhostname.
Onallfivehosts:
$ echo $HOSTNAME
Thevalueelephant,tiger,horse,monkeyorlionshouldappear.
12. StartaSOCKS5proxyserverontheVM.ThebrowseronyourVMwillusetheproxytocommunicatewithyourEC2instances.
OpenonemoreterminalwindowonthelocalVM–notatabinanexistingwindow–andenterthefollowingcommand:
$ start_SOCKS5_proxy.sh
Whenthescriptpromptsyoutoconfirmthatyouwanttocontinueconnecting,enteryesandthenpressEnter.
13. MinimizetheterminalwindowinwhichyoustartedtheSOCKS5proxyserver.
12
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
12
Important:DonotclosethisterminalwindoworexittheSSHsessionrunninginitwhileyouareworkingonexercises.Ifyouaccidentallyterminatetheproxyserver,restartitbyopeninganewterminalwindowandrerunningthestart_SOCKS5_proxy.shscript.
At the End of Each Day
At the end of the day, exit all active SSH sessions you have with EC2 instances
(including the session running the SOCKS5 proxy server). To exit an SSH
session, simply execute the exit command.
CAUTION: Do not shut down your EC2 instances.
Then suspend the Get2EC2 VM.
When you return in the morning, resume the Get2EC2 VM, reestablish SSH
sessions with the EC2 instances, and restart the SOCKS5 proxy server.
To reestablish SSH sessions with the EC2 instances, open a terminal window (or
tab), and then execute the appropriate connect_to script.
To restart the SOCKS5 proxy server, open a separate terminal window, and then
execute the start_SOCKS5_proxy.sh script.
This is the end of the setup activity for the training environment.
13
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
13
Hands-On Exercise: Installing Cloudera Manager Server WithClouderaManager,youcaninstallyourHadoopclusterusingoneoftwoinstallationoptions,referredtoasInstallationPathAandInstallationPathBintheClouderaManagerdocumentation.InstallationPathAisdrivenentirelybyGUI-basedwizardsandisappropriateforademoorproofofconceptinstallation.InstallationPathBiscommand-linedrivenandisappropriateforproductiondeployments.
Forthisinstallation,youwilluseInstallationPathBtoinstallClouderaManagerServeronthelionmachine.BeforeinstallingClouderaManageryouwillconfigureanexternaldatabase(MySQL)tobeusedbyClouderaManagerandCDH,whichyouwillinstallinthenextexercise.
Completeallstepsinhisexerciseonlion.
Verify Existing Environment
1. VerifytheOracleJDKisinstalledandthatJAVA_HOMEisdefinedandreferencedinthesystemPATH.
On lion:
$ java –version
$ echo $JAVA_HOME
$ env | grep PATH
2. VerifyPythonisinstalled.ItisarequirementforHue,whichyouwillinstalllaterinthecourse.
On lion:
$ sudo rpm -qa | grep python-2.6
14
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
14
3. VerifyOracleMySQLServerisinstalledandrunningonlion.
On lion:
$ chkconfig | grep mysqld
$ sudo service mysqld status
Notethatinatrueproductiondeploymentyouwouldalsomodifythe/etc/my.cnfMySQLconfigurationsandmovetheInnoDBlogfilestoabackuplocation.
4. VerifytheMySQLJDBCConnectorisinstalled.Sqoop(apartofCDHthatyouwillinstallinthiscourse)doesnotshipwithaJDBCconnector,butdoesrequireone.
On lion:
$ ls -l /usr/share/java
Configure the External Database
Inkeepingwiththeapproachtoinstallationthatisappropriateforaproductionclusterenvironment,youwilluseanexternaldatabaseratherthantheembeddedPostgreSQLdatabaseoption.
1. CreatetherequireddatabasesinMySQL.
Onlion,open~/training_materials/admin/scripts/mysql-setup.sqlinatexteditorandobservethescriptdetails.ThisscriptcreatesdatabasesforClouderaManager,theHiveMetastore,theActivityMonitor,andtheReportsManager.
NotethatifyouweregoingtoaddtheSentryServerCDHserviceorinstallClouderaNavigator,additionaldatabaseswouldalsoneedtobecreated.
On lion:
15
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
15
$ mysql -u root < \
~/training_materials/admin/scripts/mysql-setup.sql
$ mysql -u root
Now,atthemysql>promptonlion,issuethefollowingcommands:
show databases;
exit;
The‘showdatabases’commandshouldshowthatthefourdatabaseswerecreated.
Note:inatrueproductiondeploymentyouwouldalsoregularlybackupyourdatabase.
2. MakeyourMySQLinstallationsecureandsettherootuserpassword.
On lion:
16
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
16
$ sudo /usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] Y
New password: training
Re-enter new password: training
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] Y
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!
3. VerifytheClouderaManagerlocalsoftwarerepository.
YourinstancescontainalocalyumrepositoryofClouderaManager5softwaretosavedownloadtimeinthiscourse.
CentOS(andRedHat)storesoftwarerepositoryreferencesin/etc/yum.repos.d.Issuethecommandbelowtoviewthesettings.
On lion:
$ cat /etc/yum.repos.d/cloudera-cm5.repo
Viewthesoftwarepackagesinasubfolderofthereferenceddirectory.
On lion:
17
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
17
ls ~/software/cloudera-cm5/RPMS/x86_64
NotethatthesetwolocationsexistoneachofthefivemachinesinyourenvironmentandhavealsobeenmadeavailableonHTTPports8050and8000respectivelyviaaLinuxservice.ThissetupisspecifictothecourseenvironmentandisnotrequiredforClouderaManagerorCDHinstallations.
Ifyouwantedtoinstallfromtheonlinerepository,youwouldcreateareferencetotheClouderarepositoryinyour/etc/yum.repos.ddirectory.
Install Cloudera Manager Server
1. InstallClouderaManagerServer.
On lion:
$ sudo yum install -y cloudera-manager-daemons \
cloudera-manager-server
Note:The-yoptionprovidesananswerofyesinresponsetoanexpectedconfirmationprompt.
2. SettheClouderaManagerServerservicetonotstartonboot.
On lion:
$ sudo chkconfig cloudera-scm-server off
3. RunthescripttopreparetheClouderaManagerdatabase.
On lion:
$ sudo /usr/share/cmf/schema/scm_prepare_database.sh \
mysql cmserver cmserveruser password
Afterrunningthecommandaboveyoushouldseethemessage,“Alldone,yourSCMdatabaseisconfiguredcorrectly!”
18
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
18
4. Verifythelocalsoftwarerepositories.
RunthefollowingtwocommandstoverifytheURLsareworking.
On lion:
$ curl lion:8000
$ curl lion:8050
Eachcommandshouldshowhyperlinks(<ahref=”…”code)tosoftwarerepositories.Ifeithercommanddidnotsuccessfullycontactthewebserver,discusswiththeinstructorbeforecontinuing.
5. StarttheClouderaManagerServer.
Onlion:
$ sudo service cloudera-scm-server start
6. ReviewtheprocessesstartedbytheClouderaManagerServer.
Onlion:
$ top
ClouderaManagerServerrunsasajavaprocessthatyoucanviewbyusingthetopLinuxutility.NoticetheCPUusageisinthe90thpercentileoraboveforafewsecondswhiletheserverstarts.OncetheCPUusagedrops,theClouderaManagerbrowserinterfacewillbecomeavailable.
Onlion:
$ ps wax | grep [c]loudera-scm-server
TheresultsofthepscommandaboveshowthatClouderaManagerServerisusingtheJDBCMySQLconnectortoconnecttoMySQL.Italsoshowsloggingconfigurationandotherdetails.
7. ReviewtheClouderaManagerServerlogfile.
19
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
19
Thepathtothelogfileis/var/log/cloudera-scm-server/cloudera-scm-server.log.NotethatyoumustusesudotoaccessClouderaManagerlogsbecauseofrestrictedpermissionsontheClouderaManagerlogfiledirectories.
On lion:
$ sudo less /var/log/cloudera-scm-server/cloudera-scm-\
server.log
Note:YouwilllogintoClouderaManagerinthenextexercise.
This is the end of the exercise.
20
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
20
Hands-on Exercise: Creating a Hadoop Cluster Inthistask,youwilllogintotheClouderaManagerAdminConsoleandusetheClouderaManagerserviceswizardtocreateaHadoopcluster.ThewizardwillpromptyoutoidentifythemachinestobemanagedbyClouderaManager.ItwilltheninstalltheClouderaManagerAgentoneachmachine.Atthatpoint,yourenvironmentwillbereadyfortheCDHsoftwaretobeinstalled.
YouwillthenbepromptedtochoosewhichCDHservicesyouwanttoaddintheclusterandtowhichmachinesyouwouldliketoaddeachservice.
Attheendofthisexercise,youwillhaveHadoopdaemonsdeployedacrossyourclusterasdepictedhere(daemonsaddedinthisexerciseshowninblue).
21
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
21
InsubsequentExercises,youwilladdmoreHadoopservicestoyourcluster.
Becauseyouhaveonlyfivehoststoworkwith,youwillhavetorunmultipleHadoopdaemonsonallthehostsexceptlion,whichwillbeusedforClouderaManagerservicesonly.Forexample,theNameNode,aDataNodeandaNodeManagerwillallrunonelephant.HavingaverylimitednumberofhostsisnotunusualwhendeployingHadoopforaproof-of-conceptproject.Pleasefollowthebestpracticesinthe“PlanningYourHadoopCluster”chapterwhensizinganddeployingyourproductionclusters.
Aftercompletingtheinstallationsteps,youwillreviewaClouderaManagerAgentlogfileandreviewprocessesrunningonamachineinthecluster.Finally,thereisasectionattheendofthisexercisethatprovidesabrieftouroftheClouderaManagerAdminUI.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandonelephantandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
Login to Cloudera Manager Admin UI
1. VerifythatyouarerunningaSOCKS5proxyserverstartedwiththestart_SOCKS5_proxy.shscript.TheproxyservershouldberunninginaseparateterminalwindowontheGet2EC2VM.
2. StarttheClouderaManagerAdminConsolefromabrowser.TheURLishttp://lion:7180.
IfanUnabletoConnectmessageappears,theClouderaManagerserverhasnotyetfullystarted.Waitseveralmoments,andthenattempttoconnectagain.
3. Loginwiththeusernameadmin.Thepasswordisadmin.
22
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
22
The“WelcometoClouderaManager”pageappearswithatablethatprovidesdetailsabouteditionsofClouderaManagersoftware.
Install Cloudera Manager Agents
1. Agreenboxhighlightingaproducteditioncolumnshouldappear.
SelecttheClouderaEnterpriseDataHubEditionTrial.
ClickContinue.
2. The“ThankyouforchoosingClouderaManagerandCDH”pageappears.
ClickContinue.
3. The“SpecifyhostsforyourCDHclusterinstallation”pageappears.
Typeinthenamesofallfivemachines:elephant tiger horse monkey lion
ClickSearch.Allfivemachinesshouldbefound.Ensuretheyareallselected.
23
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
23
ClickContinue.
4. The“ClusterInstallation–SelectRepository”pageappears.
Specifythefollowingoption:
• ChooseMethod-UseParcels
Clickthe“MoreOptions”button.
• Inthe“RemoteParcelRepositoryURLs”area,removeallthecurrentlinesbyusingtheminus(-)signbutton.
• Oncetheexistingentriesareallremoved,clickontheplus(+)signtoaddanewURL.
• Intheblankfieldthatappears,typein http://lion:8000
• ClickOKtoreturntotheSelectRepositorypage.
The‘SelectRepository’pageshouldnowshowthatCDH-5.x.x-x.cdh5.x.x.px.xx(whereeachxisaversionnumber)isavailable.SelectthisversionofCDH.
Inthe“SelectthespecificreleaseoftheClouderaManagerAgentyouwanttoinstallonyourhosts”areachoose“CustomRepository”.
• Intheblankfieldthatappears,typein http://lion:8050
ClickContinue.
5. The“ClusterInstallation–JDKInstallationOptions”pageappears.
AsupportedversionoftheOracleJDKisalreadyinstalled.
VerifytheboxisuncheckedsothattheinstallationofJDKwillbeskipped.
24
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
24
ClickContinue.
6. The“ClusterInstallation–EnableSingleUserMode”pageappears.
Keepthedefaultsetting(SingleUserModenotenabled)andclickContinue.
7. The“ClusterInstallation–ProvideSSHlogincredentials”pageappears.
Keep“LoginToAllHostsAs”settoroot.
For“AuthenticationMethod:choose“Allhostsacceptsameprivatekey”
ClicktheBrowsebutton.Inthe“Places”columnchoose“training”.Then,inthe“Name”area,right-click(oronaMacCtrl-click),andselect“ShowHiddenFiles.”Finally,clickintothe.sshdirectory,selecttheid_rsafileandclickOpen.
LeavethepassphrasefieldsblankfieldsandclickContinue.
WhenpromptedtocontinuewithnopassphraseclickOK.
8. The”ClusterInstallation–Installationinprogress”pageappears.
ClouderaManagerinstallstheClouderaManagerAgentoneachmachine.
Oncetheinstallationsuccessfullycompletesonallmachines,clickContinue.
Install Hadoop Cluster
1. The”ClusterInstallation–InstallingSelectedParcels”pageappears.
25
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
25
TheCDHparcelisdownloaded,distributed,andactivatedonallhostsinthecluster.
The“distributing”actionmaytakeafewminutessincethisinvolvescopyingthelargeCDHparcelfilefromthelionmachinetoalloftheothermachinesinthecluster.
Whenitcompletes,clickContinue.
2. The“ClusterInstallation–Inspecthostsforcorrectness”pageappears.
Afteramoment,outputfromtheHostInspectorappears.
ClickFinish.
3. The“ClusterSetup-ChoosetheCDH5servicesthatyouwanttoinstallonyourcluster”pageappears.
Click“CustomServices”.
AtableappearswithalistofCDHservicetypes.
4. SelecttheHDFSandYARN(MR2Included)servicetypes.
Youwilladdadditionalservicesinlaterexercises.
ClickContinue.
26
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
26
5. TheClusterSetup–CustomizeRoleAssignmentspageappears.
Assignthefollowingroles.
Role Node(s)
HDFS
NameNode elephant
SecondaryNameNode tiger
Balancer horse
HttpFS Donotspecifyanyhosts
NFSGateway Donotspecifyanyhosts
DataNode elephant,tiger,horse,monkey,butnotlion;theorderinwhichthehostsarespecifiedisnotsignificant
ClouderaManagementService
ServiceMonitor lion
ActivityMonitor lion
HostMonitor lion
ReportsManager lion
EventServer lion
AlertPublisher lion
YARN(MR2Included)
ResourceManager horse
JobHistoryServer monkey
NodeManagerSameasDataNode(elephant,tiger,horse,monkey,butnotlion)
Toassignarole,clickthefieldswithoneormorehostnamesinthem.Forexample,thefieldunderNameNodeinitiallyhasthevaluelion.TochangetheroleassignmentfortheNameNodetoelephant,clickthefieldunderNameNode(thathaslioninit).Alistofhostsappears.Selectelephant,andthenclickOK.ThefieldunderNameNodenowhasthevalueelephantinit.
27
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
27
Whenyouhavefinishedassigningroles,compareyourroleassignmentstotheroleassignmentsinthescreenshotonthenextpage.
Verifythatyourroleassignmentsarecorrect.
Whenyouarecertainthatyouhavethecorrectroleassignments,clickContinueandproceedtothestepafterthescreenshot.
6. The“ClusterSetup–DatabaseSetup”pageappears.
Fillinthedetailsasshowhere.
DatabaseHostName
DatabaseType
DatabaseName
Username Password
lion MySQL amon amonuser password lion MySQL rman rmanuser password
Click“TestConnection”toverifythatClouderaManagercanconnecttotheMySQLdatabasesyoucreatedinanearlierExerciseinthiscourse.
Afteryouhaveverifiedthatallconnectionsaresuccessful,clickContinue.
28
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
28
7. TheClusterSetup–ReviewChangespageappears.
Specifythefollowingvalues(changelibtolog):
• HostMonitorStorageDirectory–/var/log/cloudera-host-monitor
• ServiceMonitorStorageDirectory–/var/log/cloudera-service-monitor
ClickContinue.
8. The“ClusterSetup–Progress”pageappears.
Progressmessagesappearwhileclusterservicesarebeingcreatedandstarting.
ThefollowingisasummaryofwhatClouderaManager’sClusterSetupwizardliststhatitdoesforyou:
• FormatstheHDFSNameNode• StartstheHDFSservice• Createsa/tmpdirectoryinHDFS• CreatesaMR2JobHistorydirectoryinHDFS• CreatesaYARNNodeManagerremoteapplicationlogdirectoryinHDFS• StartstheYARNservice• StartstheClouderaManagerServiceservice• Deploysallclientconfigurations
Whenalltheclusterserviceshavestarted,clickContinue.
29
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
29
9. The“ClusterSetup–Congratulations!”pageappears.
Thepageindicatesthatserviceshavebeenaddedandarenowconfiguredandrunning.ClickFinish.
TheClouderaManagerHomepageappears.Clusterinstallationisnowcomplete.
10. Anoteregardingtheconfigurationwarnings.
Theconfigurationwarnings-highlightedinredinthescreenshotbelow—areexpected,andindicatethatalthoughtheservicesareingoodhealth,theyareoperatinginlowmemoryconditions.Inaproductiondeploymentyouwouldwanttoensurethesewarningwereaddressed.However,inourtrainingenvironmentthesewarningscanbesafelyignored.
Examine Running Processes on a Cluster Node
1. Reviewoperatingsystemservicesonahostinthecluster.
Onelephant:
$ chkconfig | grep cloudera
$ sudo service cloudera-scm-agent status
YouhavealreadyaddedYARN(IncludingMR2)andHDFSservices,yettheonlyserviceregisteredwithinitscriptsattheoperatingsystemlevelisthecloudera-scm-agentservice.
30
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
30
WithClouderaManagermanagedclusters,thecloudera-manager-agent serviceoneachclusternodemanagesstartingandstoppingthedeployedHadoopdaemons.
2. ReviewtherunningJavaprocessesonahostinthecluster.
on elephant:
$ sudo jps
TheHadoopdaemonsrunasJavaprocesses.YoushouldseetheNodeManager,NameNode,andDataNodeprocessesrunning.
ExaminethedetailsofoneoftherunningCDHdaemons.
$ sudo ps -ef | grep NAMENODE
Youwillseedetailsabouttheprocess.
IfyougrepforDATANODEorNODEMANAGERyouwillseedetailsonthoseprocesses.Grepfor‘cloudera’toseeallCDHandClouderaManagerprocessescurrentlyrunningonthemachine.
Testing Your Hadoop Installation
WewillnowtesttheHadoopinstallationbyuploadingsomedata.Thehdfs dfscommandyouuseinthisstepwillbeexploredingreaterdetailinthenextexercise.
1. Uploadsomedatatothecluster.
ThecommandsbelowhaveyouunzipafileonthelocaldriveandthenuploadittoHDFS’/tmpdirectory.
Onelephant:
31
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
31
$ cd ~/training_materials/admin/data
$ gunzip shakespeare.txt.gz
$ hdfs dfs -put shakespeare.txt /tmp
2. VerifythatthefileisnowinHDFS.
InClouderaManagerchooseClusters>HDFS.ThenclickonFileBrowser.
Browseintotmp andconfirmthatshakespeare.txtappears.
A Brief Tour of Cloudera Manager
NowthatyouhavetheClouderaManagermanagedclusterrunning,let’sbrieflyexploreafewareasintheAdminconsole.
OntheHomepage,youwillseetheclusternamedCluster1thatyoucreated.
1. ClickonthedropdownmenutotherightofCluster1toseemanyoftheactionsyoucanperformonthecluster,suchasAddaService,Stop,andRestart.
32
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
32
2. ClickHoststoviewthecurrentstatusofeachmanagedhostinthecluster.Clickingonthe>iconintheRolescolumnforahostwillrevealwhichrolesaredeployedonthathost.
InthisexerciseandtheexercisesthatfollowyouwilldiscovermanyotherareasofClouderaManagerthatwillproveusefulforadministeringyourHadoopcluster(s).
This is the end of the exercise.
33
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
33
Hands-On Exercise: Working With HDFS InthisHands-OnExerciseyouwillcopyalargeWebserverlogfileintoHDFSandexploretheresultsintheHadoopNameNodeWebUIandtheLinuxfilesystem.YouwillthencreateasnapshotofadirectoryinHDFS,deletesomedataandthenrestorebacktoitsoriginallocationinHDFS.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
PerformallCommandLinestepsinthisexerciseonelephant.
Confirm HDFS Processes and Settings
1. ConfirmthatallHDFSprocessesarerunning.
FromtheClouderaManagerHomepageclickonHDFSandthenclickontheInstancestab.
NoticethatthethreedaemonsthatsupportHDFS–theNameNode,SecondaryNameNode,andDataNodedaemons–arerunning.InfactthereareDataNodesrunningonfourhosts.
2. DeterminethecurrentHDFSreplicationfactor.
FromtheClouderaManagerHomepageclickintotheSearchboxandtype“replication”.
Choose“HDFS:ReplicationFactor”whichshouldbeoneofthesearchresults.
YouaretakentotheHDFSConfigurationpagewhereyouwillfindthedfs.replicationsettingthathasthedefaultvalueof3.
34
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
34
3. Similarly,usethesearchboxintheHDFSconfigurationpage,andsearchfor“blocksize”todiscovertheHDFSBlockSizesettingwhichdefaultsto128MiB.
Add Directories and Files to HDFS
1. Asthehdfsuser,createahomedirectoryforthetraininguseronHDFSandgivethetraininguserownershipofit.
Onelephant:
$ sudo -u hdfs hdfs dfs -mkdir -p /user/training/
$ sudo -u hdfs hdfs dfs -chown training /user/training
2. CreateanHDFSdirectorycalled/user/training/weblog,inwhichyouwillstoretheaccesslog.
Onelephant:
$ hdfs dfs -mkdir weblog
3. Extracttheaccess_log.gzfileanduploadittoHDFSinasinglestep.
Onelephant:
$ cd ~/training_materials/admin/data
$ gunzip -c access_log.gz \
| hdfs dfs -put - weblog/access_log
TheputcommanduploadedthefiletoHDFS.Thedashaftertheputcommandreadstheinputfromstdinandwritesittothedestinationdirectory.
Sincethefilesizeis504MB,HDFSwillhavesplititintomultipleblocks.Let’sexplorehowthefilewasstored.
4. Runthehdfs dfs -lscommandtoreviewthefile’spermissionsinHDFS.
Onelephant:
35
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
35
$ hdfs dfs -ls /user/training/weblog
Analyze File Storage in HDFS and On Disk
1. Locateinformationabouttheaccess_logfileintheNameNodeWebUI.
a. FromtheClouderaManagerClustersmenu,chooseHDFSforyourcluster.
b. IntheHDFSStatuspage,clickon“NameNodeWebUI(Active)”.ThiswillopentheNameNodeWebUIathttp://elephant:50070.
c. ChooseUtilities>“Browsethefilesystem,”thennavigateintothe/user/training/weblogdirectory.
d. Noticethatthepermissionsshownfortheaccess_logfileareidenticaltothepermissionsyouobservedwhenyouranthehdfs dfs -lscommand.
e. Nowselecttheaccess_logfileintheNameNodeWebUIfilebrowsertobringuptheFileinformationwindow.
NoticethattheBlockInformationdropdownlisthasfourentries,Block0,Block1,Block2,andBlock3.Thismakessensebecauseasyoudiscovered,theHDFSBlockSizeonyourclusteris128MB.Theextractedaccess_logfileis559MB.
Blocks0,1,and2allshowasizeof134217728(or128MB)inaccordancewiththesizespecifiedintheHDFSblocksizesettingyouobservedearlierinthisexercise.Block3issmallerthantheothersasyoucaninthe“Size”detailsifyouchooseit.
Noticealsothateachblockisavailableonthreedifferenthostsinthecluster.Thisiswhatyouwouldexpectsincethreeisthecurrent(anddefault)replicationfactorinHDFS.Alsonoticethateachblockmaybereplicatedtodifferentdatanodesthantheotherblocksthatmakeupthefile.
36
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
36
a. ChooseBlock0andwritedownthevaluethatappearsintheSizefield.IfBlock0isnotreplicatedonelephantthenchoseanotherblockthatisreplicatedonelephant.
________________________________________________________________________________
YouwillneedthisvaluewhenyouexaminetheLinuxfilethatcontainsthisblock.
f. SelectthevalueoftheBlockIDfieldandcopyit(Editmenu>Copy).Youwillneedthisvalueforthenextstepinthisexercise.
2. LocatetheHDFSblockontheelephantLinuxfilesystem.
a. InClouderaManager’sHDFSConfigurationpage,conductasearchfor“DataDirectory”.YouwillseethattheDataNodeDataDirectoryis/dfs/dn.
b. Let’sfindtheblocksstoredinthatdirectory.Onelephant:
$ sudo find /dfs/dn -name '*BLKID*' -ls
whereBLKIDistheactualBlockIDyoucopiedfromtheNameNodeWebUI.
c. VerifythattwofileswiththeBlockIDyoucopiedappearinthefindcommandoutput–onefilewithanextension,.meta,andanotherfilewithoutthisextension.
d. VerifyintheresultsofthefindcommandoutputthatthesizeofthefilecontainingtheHDFSblockisexactlythesizethatwasreportedintheNameNodeWebUI.
3. StartinganyLinuxeditorwithsudo,openthefilecontainingtheHDFSblock.Verifythatthefirstfewlinesofthefilematchthefirstchunkoftheaccess_logfilecontent.
37
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
37
Note:YoumuststartyoureditorwithsudobecauseyouareloggedintoLinuxasthetraininguser,andthisuserdoesnothaveprivilegestoaccesstheLinuxfilethatcontainstheHDFSblock.
$ sudo head /dfs/dn/path/to/block
Note:Replace/path/to/blockinthecommandabovewiththeactualpathtotheblockasshownintheresultsofthefindcommandyouraninthepreviousstep.
Youcanreviewtheaccess_logfilecontentonHDFSasfollows:
$ hdfs dfs -cat weblog/access_log | head -
Theresultsreturnedbythelasttwocommandsshouldmatchexactly.
This is the end of the exercise.
38
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
38
Hands-On Exercise: Running YARN Applications InthisexerciseyouwillruntwoYARNapplicationsonyourcluster.ThefirstapplicationisaMapReducejob.ThesecondisanApacheSparkapplication.YouwilladdtheSparkserviceonyourclusterbeforerunningtheSparkapplication.Aftercompletingthisexercise,yourclusterwillhavethefollowingcomponentsinstalled(itemsinstalledinthisexercisehighlightedinblue):
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
39
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
39
Performallstepsinthisexerciseonelephant.
Submitting a MapReduce Application to Your Cluster
WewillnowtesttheHadoopinstallationbyrunningasampleHadoopapplicationthatshipswiththeHadoopsourcecode.ThisisWordCount,aclassicMapReduceprogram.We’llruntheWordCountprogramagainsttheShakespearedataaddedtoHDFSinapreviousexercise.
1. SincethecodefortheapplicationwewanttoexecuteisinaJavaArchive(JAR)file,we’llusethehadoop jarcommandtosubmitittothecluster.LikemanyMapReduceprograms,WordCountacceptstwoadditionalarguments:theHDFSdirectorypathcontaininginputandtheHDFSdirectorypathintowhichoutputshouldbeplaced.Therefore,wecanruntheWordCountprogramwiththefollowingcommand.
Onelephant:
$ hadoop jar /opt/cloudera/parcels/CDH-5.3.2-1.cdh\
5.3.2.p0.10/lib/hadoop-mapreduce/hadoop-mapreduce-\
examples.jar wordcount /tmp/shakespeare.txt counts
Verifying MapReduce Job Output
2. Oncetheprogramhascompletedyoucaninspecttheoutputbylistingthecontentsoftheoutput(counts)directory.
Onelephant:
$ hdfs dfs -ls counts
3. Thisdirectoryshouldshowallthedataoutputforthejob.Joboutputwillincludea_SUCCESSflagandonefilecreatedbyeachReducerthatran.Youcanviewtheoutputbyusingthehdfs dfs -catcommand.
Onelephant:
40
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
40
$ hdfs dfs -cat counts/part-r-00000
Review the MapReduce Application Details and Logs
InthistaskyouwillstartbylookingatdetailsintheYARNApplicationsareaofClouderaManager.TheApplicationDetailslinkinClouderaManagerwillthentakeyoutotheHistoryServerWebUIathttp://monkey:19888.
Asyougothroughthestepsbelow,seeifyoucanreconstructwhatoccurredwherewhenyourantheMapReducejob,bycreatingachartliketheonebelow.
Node(s)
ApplicationMaster
MapTask(s)
ReduceTask(s)
1. LocateyourMapReduceapplicationinClouderaManager.
InClouderaManager,chooseClusters>YARN(MR2Included),thenclickonApplications.
IntheResultstabthatdisplays,youshouldseetheMapReduceapplicationthatjustran.
Note:TherewillbeanentryforeachcompletedMapReducejobthatyouhaverunonyourclusterwithinthetimeframeofyoursearch.Thedefaultsearchisforapplicationsrunwithinthelast30minutes.
2. AccesstheHistoryServerWebUItodiscoverwheretheApplicationMasterran.
Fromthedropdownmenuforthe“wordcount”application,choose“ApplicationDetails.”
41
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
41
ThisactionwillopenapageintheHistoryServerWebUIwithdetailsaboutthejob.
3. LocatewheretheApplicationMasterranandviewthelog.
NoticetheApplicationMastersectionshowswhichclusternoderantheMapReduceApplicationMaster.Clickthe“logs”linktoviewtheApplicationMasterlog.
Noticealsothenumberofmapandreducetasksthatraninordertocompletethewordcountjob.Thenumberofreducersrunbythejobshouldcorrespondtothenumberofpart-r-#####filesyousawwhenyouranthehdfs dfs -lscommandearlier.Therearenopart-m-#####filesbecausethejobranatleastonereducer.
4. Locatewherethemappertaskranandviewthelog.
FromtheHistoryServerWebUI’s“Job”menuchoose“Maptasks”.
FromtheMapTaskstable,clickonthelinkintheNamecolumnforthetask.
TheAttemptstabledisplays.Noticethe“Node”columnshowsyouwherethemaptaskattemptran.
Clickthe“logs”linkandreviewthecontentsofthemappertasklog.Whendone,clickthebrowserbackbuttontoreturntothepreviouspage.
5. Locatewherethereducetasksranandviewthelogs.
FromtheHistoryServerWebUI’sJobmenuchoose“Reducetasks.”
42
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
42
FromtheReduceTaskstable,clickonthelinkintheNamecolumnforoneofthetasks.
TheAttemptstabledisplays.Noticethe“Node”columnshowsyouwherethisreducertaskran.
Clickthe“logs”linkandreviewthecontentsofthelog.ObservetheamountofoutputintheReducertasklog.Whendone,clickthebrowserbackbuttontoreturntothepreviouspage.
6. DeterminetheloglevelforReducertasksfortheword countjob.
ExpandtheJobmenuandchoose“Configuration.”
Twentyentriesfromthejobconfigurationthatwereineffectwhentheword countjobranappear.
IntheSearchfield,enterlog.level.
Locatethemapreduce.reduce.log.levelproperty.ItsvalueshouldbeINFO.
Note:INFOisdefaultvalueforthe“JobHistoryServerLoggingThreshold”whichcanbefoundintheClouderaManagerYARNConfigurationpageforyourcluster.
Run the MapReduce Application with a Custom Log Level Setting
1. RemovethecountsdirectoryfromHDFSandreruntheWordCountprogram,thistimepassingitaloglevelargument.
Onelephant:
43
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
43
$ hdfs dfs -rm -r counts
$ hadoop jar /opt/cloudera/parcels/CDH-5.3.2-1.cdh\
5.3.2.p0.10/lib/hadoop-mapreduce/hadoop-mapreduce-\
examples.jar wordcount \
-D mapreduce.reduce.log.level=DEBUG \
/tmp/shakespeare.txt counts
Note:YoumustdeletethecountsdirectorybeforerunningtheWordCountprogramasecondtimebecauseMapReducewillnotrunifyouspecifyanoutputpathwhichalreadyexists.
MapReduceprogramscodedtotakeadvantageoftheHadoopToolRunnerallowyoutopassseveraltypesofargumentstoHadoop,includingrun-timeconfigurationparameters.Thehadoop jarcommandshownabovesetstheloglevelforreducetaskstoDEBUG.
Note:The-Doption,asusedinthehadoop jarcommandabove,allowsyouoverrideadefaultpropertysettingbyspecifyingthepropertyandthevalueyouwanttoassign.
Whenyourjobisrunning,lookforalineinstandardoutputsimilartothefollowing:
14/12/09 05:47:16 INFO mapreduce.Job: Running job:
job_1391249780844_0004
2. Afterthejobcompletes,locateandviewoneofthereducerlogs.
FromClouderaManager’sYARNApplicationspageforyourcluster,locatetheentryfortheapplicationthatyoujustran.
ClickontheIDlink,andusetheinformationavailableundertheJobmenu’s“Configuration”and“Reducetasks”linkstoverifythefollowing:• Thevalueofthemapreduce.reduce.log.levelconfiguration
attributeisDEBUG.
44
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
44
• TheReducertask’slogsforthisjobcontainDEBUGlogrecordsandthe
logsarelargerthanthenumberofrecordswrittentotheReducertask’slogsduringthepreviousWordCountjobexecution.
3. VerifytheresultsofthewordcountjobwerewrittentoHDFSusinganyofthefollowingthreemethods.
Option1:InClouderaManager,browsetotheHDFSpageforyourcluster,thenchooseFileBrowser.Drilldowninto/user/training/counts.
Option2:AccesstheHDFSNameNodeWebUIathttp://elephant:50070.ChooseUtilities>“Browsethefilesystem”,andnavigatetothe/user/training/counts directory.
Option3:onanymachineintheclusterthathastheDataNodeinstalled(elephant,tiger,monkey,orhorse)runthefollowingcommandinaterminal:
$ hdfs dfs -tail counts/part-r-00000
Add the Apache Spark Service
Inthistask,youwilladdtheSparkservicetoyourclusterusingClouderaManager.YouwillthenrunaSparkapplication.
1. InClouderaManager,navigatetotheHomepage.
2. Selectthe“AddaService”optionfromthedrop-downmenuforCluster1.
45
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
45
TheAddServiceWizardappears.
3. SelectSparkandclickContinue.
The“Selectthesetofdependenciesforyournewservice”pageappears.
4. Selecttherowcontainingthehdfsandyarnservices,thenclickContinue.
TheCustomizeRoleAssignmentspageappears.
5. Specifyhostassignmentsasfollows:
• HistoryServer–monkeyonly• Gateway–monkeyonly
ClickContinue.
6. ProgressmessagesappearontheProgresspage.
Whentheaddingoftheservicehascompleted,clickContinue.
7. TheCongratulationspageappears.
ClickFinish.
8. TheHomepageappears.
AstatusindicatorshowsyouthattheSparkserviceisingoodhealth.
Note:youmaynoticeaconfiguringwarningthatappearsnextto“Hosts”ontheClouderaManagerhomepage.Ifyoulookintoit,ClouderaManagerindicatesthatmemoryisovercommittedonhostmonkey.Thisconfigurationissue,alongwiththefiveotherconfigurationwarningsthatappearedaftertheinitialclusterinstallation,wouldneedtobeaddressedinatrueproductioncluster,howevertheycanbesafelyignoredintheclassroomenvironment.
Run Spark as a YARN Application
Youshouldcompletethesestepsonmonkey sincemonkeyiswhereyoujustaddedtheSparkgatewayrole.
46
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
46
1. StarttheSparkshellandconnecttotheyarn-clientsparkcontextonmonkey.
RecallthattheSparkGatewayservicewasinstalledonmonkeysotheSparkshellshouldberunfrommonkey.
Onmonkey:
$ spark-shell --master yarn-client
TheScalaSparkshellwilllaunch.Youshouldeventuallyseethemessage,“Sparkcontextavailableassc.”Ifnecessary,clickthe<Enter>keyonyourkeyboardtoseethescala>prompt.
2. Typeinthecommandsbelowtorunawordcountapplicationusingtheshakespeare.txtfilethatisalreadyinHDFS.
ThisaccomplishessomethingverysimilartothejobyouranintheMapReduceexercise,butthistimethecomputationalframeworkbeingusedisSpark.
scala> val file = sc.textFile(
"hdfs://elephant:8020/tmp/shakespeare.txt")
scala> val counts = file.flatMap(line => line.split(
" ")).map(word => (word, 1)).reduceByKey(
_ + _).sortByKey()
scala> counts.saveAsTextFile(
"hdfs://elephant:8020/tmp/sparkcount")
scala> sc.stop()
scala> sys.exit()
3. ViewtheapplicationresultswrittentoHDFSbySpark.
Onelephant:
47
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
47
$ hdfs dfs -cat /tmp/sparkcount/part-00000 | less
Review Application Details in the Spark History Server
ViewtheSparkapplicationdetailsinClouderaManagerandtheSparkHistoryServerWebUI.
1. InClouderaManager,gototheYARNApplicationspageforyourcluster.
Youwillseea“Sparkshell”application.
2. ClickthelinkintheIDfield.
ApageintheYARNResourceManagerWebUIopenswithdetailsabouttheapplication.
3. ClickontheHistorylink.
ASparkJobspageintheSparkHistoryServerWebUIopens.
NoticethatthisSparkApplicationconsistedoftwojobsthatarenowcompleted.
4. IntheCompletedJobsarea,clickonthe“sortByKey…”linkforthefirstjobthatran(JobId0).
Noticethatthisfirstjobconsistedoftwostages.
5. Clickonthe“mapat…“linkforthefirststage(Stage0).
The“DetailsforStage0”pageappears.
Hereyoucanseethatthereweretwotasksinthisstageandyoucanseeonwhichexecutorandonwhichhosteachtaskran.Youcanalsoseetasksdetailssuchasduration,inputdatasize,theamountofdatawrittentodiskduringshuffleoperations.
6. ClickontheExecutorstabtoseeasummaryofalltheexecutorsusedbytheSparkapplication.
48
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
48
Review the Spark Application Logs
1. AccesstheSparkapplicationlogsfromthecommandline.
FirstlocatetheapplicationID.
Onelephant:
$ yarn application -list -appStates FINISHED
CopytheapplicationIDfortheSparkapplicationreturnedbythecommandabove.
Nowrunthiscommand(whereappIdistheactualapplicationID).
Onelephant:
$ yarn logs -applicationId appId | less
Scrollthroughthelogsreturnedbytheshell.NoticethatthelogsforallthecontainersthatrantheSparkexecutorsareincludedintheresults.
TheseSparkapplicationlogsarestoredinHDFSunder/user/spark/\ applicationHistory.
This is the end of the exercise.
49
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
49
Hands-On Exercise: Explore Hadoop Configurations and Daemon Logs IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
Exploring Hadoop Configuration Settings
Inthistask,youwillexploresomeoftheHadoopconfigurationfilesauto-generatedbyClouderaManager.
1. GotothedirectorythatcontainstheHadoopconfigurationforClouderaManager-manageddaemonsrunningonelephantandthenviewthecontents.
Onelephant:
$ cd /var/run/cloudera-scm-agent/process
$ sudo tree
Noticehowthereareseparatedirectoriesforeachroleinstancerunningonelephant-DataNode,NameNode,andNodeManager.Noticealsothatsomefiles,suchashdfs-site.xmlandcore-site.xml,existinmorethanonedaemon’sconfigurationdirectory.
2. Comparethefirst20linesoftheNameNode’scopyofhdfs-site.xmltotheNodeManager’scopyofthesamefile.
Onelephant:
50
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
50
$ sudo head -20 nn-hdfs-NAMENODE/core-site.xml
$ sudo head -20 nn-yarn-NODEMANAGER/core-site.xml
Inthecommandsabove,replaceeachnnwiththeactualnumbersgeneratedbyClouderaManager.
Theentriesinthesecore-site.xmlfilesreflectthesettingsconfiguredinClouderaManagerfortheseparticularroleinstances.SomeofthesettingsreflectchoicesyoumadewhenyourantheClouderaManagerinstallationwizard.OtherentriesareoptimalinitialordefaultvalueschosenbyClouderaManager.
3. MakeaconfigurationchangeinClouderaManager.
InClouderaManager,browsetotheHDFSpageforyourclusterandthenchooseConfiguration.
Conductasearchfortheword“trash”.TheNameNodeDefaultGroup“FilesystemTrashInterval”settingappears.
Double-clickintotheValueareawhereitcurrentlyreads“Iday(s)”.
Changethisvalueto2day(s).ClickSaveChanges.
Noticethe “StaleConfiguration-Restartneeded”iconthatappearsonthescreen.Clickonit.
The“StaleConfigurations-ReviewChanges”screenappears,showingthechangestocore-site.xmlthatwillbemade.
Click“RestartCluster”.The“Cluster1StaleConfigurations-RestartCluster”screenappears.
Click“RestartNow”.The“StaleConfigurations-Progress”screenappears.Waitforthechildcommandstocompletesuccessfully.ClickFinish.
4. Returntotheelephantterminalwindowandlistthecontentsofthe/var/run/cloudera-scm-agent/processdirectory.
51
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
51
$ sudo ls -l /var/run/cloudera-scm-agent/process
NoticenowhowtherearenowtwodirectorieseachforNameNode,DataNode,andNodeManager.Theoldsettingshavebeenretained,howeverthenewsettingshavealsobeendeployedandwillnowbeused.Thedirectorywiththehighernumberinthenameisthenewerone.
5. FindthedifferencebetweentheoldNameNodecore-site.xmlfileandthenewone.
Onelephant:
$ sudo diff -C 2 nn-hdfs-NAMENODE/core-site.xml nn-hd\
fs-NAMENODE/core-site.xml
Thennvaluesaboveshouldbereplacedwiththeactualnumberswithwhichtheconfigurationdirectoriesarenamed.
Youshouldseethatthefs.trash.intervalpropertyvaluechangehasbeendeployedtothenewNameNodeconfigurationfile.
6. Reverttheconfigurationchangeandrestartthecluster.
InClouderaManager,gototheHDFSConfigurationpageforyourcluster.
Clickthe“HistoryandRollback”button.
The“ConfigurationandRoleGroupHistory”pagedisplays.
52
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
52
UnderCurrentRevisionclick“Details”.
TheRevisionDetailsscreendisplays.NoticetheFilesystemTrashIntervalpropertyvaluethatyoujustmodifiedislisted.Choose“RevertConfigurationChanges”.
Amessageappearsindicatingthattherevisionwasreverted.ClickontotheHDFSStatuspageandnoticethe“StaleConfiguration-Restartneeded”icon.
Clickontheiconandfollowthestepstorestartthecluster.
Examining Hadoop Daemon Log Files
InthepreviousExercise,youreviewedtheapplicationlogsfromMapReduceandSparkrunningasYARNapplications.HereyouwillreviewHadoopdaemonlogfiles,includingtheHDFSNameNodeandYARNResourceManagerlogfiles.
WithClouderaManager,Hadoopdaemonsgeneratea.log.outfile,astandarderrorlog(stderr.log),andastandardoutputlog(stdout.log).
Inthistask,youwillexamineHadoopdaemonlogfilesusingtheNameNodeWebUI,ClouderaManager,theResourceManagerWebUIandtheNodeManagerWebUI.
1. ViewtheNameNodelogfileusingtheNameNodeWebUI.
53
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
53
AccesstheNameNodeWebUIfromtheQuickLinksontheHDFSStatuspageinClouderaManager.
FromtheNameNodeWebUI,selectUtilities>Logs.Thelistoffoldersandfilesinthe/var/log/hadoop-hdfsdirectoryonelephantappears.
OpentheNameNodelogfileandreviewthefile.
2. AccessthedaemonlogsdirectlyfromClouderaManager.
InClouderaManager,chooseHostsfromthetopnavigationbar.
SelectelephantandthenchoosetheProcessestab.
LocatetherowfortheNameNodeandclick“Fulllogfile.”
TheLogDetailspageopensatthetailendoftheNameNodelogfile.Scrolluptoviewearlierlogmessages.
Note:Ifthelogfileisverylarge,andyouwanttoseemessagesnearthetop,scrollingintheClouderaManagerUIwillbeslow.Othertoolsprovidequickaccesstotheentirelogfile.
Click“DownloadFullLog”(intheupperrightcorneroftheLogDetailspage)todownloadtheentirelog.
3. ReviewtheNameNodedaemonsstandarderrorandstandardoutputlogsusingClouderaManager.
ReturntotheProcessespageforelephant.
ClicktheStdoutlinkfortheNameNodeinstance.Thestandardoutputlogappears.Reviewthefile,thenreturntotheProcessespage.
ClicktheStderrlinkfortheNameNodeinstance.Thestandarderrorlogappears.Reviewthefile.
Note:ifyouwanttolocatetheselogfilesondisk,theycanbefoundonelephantinthe/var/log/hadoop-hdfsand/var/run/cloudera-scm-agent/process/nn-hdfs-NAMENODE/logsdirectories.
4. UsingClouderaManager,reviewrecententriesintheSecondaryNameNodelogs.
54
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
54
Tofindthelog,gototheHDFSInstancespageforyourcluster,thenclickontheSecondaryNameNoderoletype,andinStatuspageforthetigerhost,click“LogFile”.
5. AccesstheResourceManagerlogfileusingtheResourceManagerWebUI.
StarttheResourceManagerWebUI(fromClouderaManager’sYARNStatuspageorbyspecifyingtheURLhttp://horse:8088inyourbrowser).
Choose“Nodes”fromtheClustermenuontheleftsideofthepage.
Clickthehorse:8042linktobetakentotheNodeManagerWebUI.
ExpandtheToolsmenuontheleftsideofthepage.
Click“Locallogs.”
Finally,clicktheentryfortheResourceManagerlogfileandreviewthefile.
55
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
55
Hands-On Exercise: Using Flume to Put Data into HDFS InthisHands-OnExerciseyouwilluseFlumetoimportdynamicallygenerateddataintoHDFS.AverycommonusecaseforFlumeistocollectaccesslogsfromalargenumberofWebserversonyournetwork;wewillsimulateasimpleversionofthisinthefollowingexercise.
ThediagrambelowshowsthedataflowthatwilloccuronceyoucompletetheExercise.
56
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
56
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
Adding the Flume Service
1. AddtheFlumeserviceonelephant and horse.
FromtheClouderaManagerHomepage,clickthedownarrownexttoyourclusterandchoose“AddaService”.
Select“Flume”andclick“Continue”.NotethatHDFSisadependency,howevertheHDFSserviceisalreadyadded.Click“Continue”again.
Usethe“Selecthosts”buttontoaddtheFlumeAgentonbothelephantand horsethenclickOKandContinue.AttheCongratulationsscreenclick“Finish”.
Note:youmaynoticethattherearenowtwonew‘Hosts’configurationissues,thistimeonelephantandhorsethat-liketheonethatappearedonmonkeyearlier-arerelatedtomemoryovercommitvalidationthresholds.Inatrueproductioncluster,youwouldwanttoaddressallconfigurationissues,howeveryoucansafelyignoretheseintheclassroomenvironment.
2. UpdatetheconfigurationfortheFlumeagentonelephant.
57
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
57
FromtheClouderaManagerHomepage,clicktheFlumelinkandthenclickontheInstancestab.
Selectthe“Agent”thatresidesonelephant,thenclickConfiguration.
Deletethedefaultcontentsofthetwopropertieslistedinthetablebelowentirelyandreplacewiththelinesshownbelow.
Note:theConfigurationFilelinesarealsoavailablein~/training_materials/admin/scripts/flume-tail1.txt
Tip:YoucanexpandtheConfigurationFiletextareatomakeiteasiertoeditinbydraggingitoutthebottomrightcornerofthetextbox.
Property Value
AgentName tail1 ConfigurationFile
tail1.sources = src1 tail1.channels = ch1 tail1.sinks = sink1 tail1.sources.src1.type = exec tail1.sources.src1.command = tail -F /tmp/access_log tail1.sources.src1.channels = ch1 tail1.channels.ch1.type = memory tail1.channels.ch1.capacity = 500 tail1.sinks.sink1.type = avro tail1.sinks.sink1.hostname = horse tail1.sinks.sink1.port = 6000 tail1.sinks.sink1.batch-size = 1 tail1.sinks.sink1.channel = ch1
ClickSaveChanges.
3. UpdatetheconfigurationfortheFlumeagentonhorse.
FromtheClouderaManagerFlumepage,clickontheInstancestab.
SelecttheAgentthatresidesonhorse,thenclickConfiguration.
Deletethedefaultcontentsofthetwopropertieslistedinthetablebelowentirelyandreplacewiththelinesshownbelow.
58
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
58
Note:theConfigurationFilelinesarealsoavailablein~/training_materials/admin/scripts/flume-collector1.txt
Property Value
AgentName collector1
ConfigurationFile
collector1.sources = src1 collector1.channels = ch1 collector1.sinks = sink1 collector1.sources.src1.type = avro collector1.sources.src1.bind = horse collector1.sources.src1.port = 6000 collector1.sources.src1.channels = ch1 collector1.channels.ch1.type = memory collector1.channels.ch1.capacity = 500 collector1.sinks.sink1.type = hdfs collector1.sinks.sink1.hdfs.path = hdfs://elephant/user/flume/collector1 collector1.sinks.sink1.hdfs.filePrefix = access_log collector1.sinks.sink1.channel = ch1
Ensurethat“collector1.sinks.sink1.hdfs.path=hdfs://elephant/user/flume/collector1”shownaboveisallonasingleline.
Click“Savechanges.”
59
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
59
4. Createthe/user/flume/collector1directoryinHDFStostorethefiles.
Onelephant:
$ sudo -u hdfs hdfs dfs -mkdir -p \
/user/flume/collector1
$ sudo -u hdfs hdfs dfs -chown -R flume /user/flume
Starting the Data Generator
1. Openanewterminalwindowonelephant(oransshconnectiontoelephant).Inthisterminalwindow,runtheaccesslog-gen.bashshellscript,whichsimulatesaWebservercreatinglogfiles.Thisshellscriptalsorotatesthelogfilesregularly.
Onelephant:
$ accesslog-gen.sh /tmp/access_log
Note:Theaccesslog-gen.bashscriptisspecifictothetrainingenvironmentandisnotpartofCDH.
2. Openasecondnewterminalwindowonelephant(oransshconnectiontoelephant).Verifythatthelogfilehasbeencreated.Noticethatthelogfileisrotatedperiodically.
Onelephant:
$ ls -l /tmp/access*
-rw-rw-r-- 1 training training 498 Nov 15 15:12 /tmp/access_log
-rw-rw-r-- 1 training training 997 Nov 15 15:12 /tmp/access_log.0
-rw-rw-r-- 1 training training 1005 Nov 15 15:11 /tmp/access_log.1
60
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
60
Starting the Flume Collector Agent
HereyoustarttheFlumeagentthatwillinsertthedataintoHDFS.ThisagentreceivesdatafromthesourceFlumeagent.
1. Startthecollector1FlumeAgentonhorse.
InClouderaManager,gotoFlume’sInstancestab.SelecttheAgenthostedonhorse.
Fromthe“ActionsforSelected”menuchooseStart.Inthe“Start”windowthatappears,click“Start”.
Inthe“CommandDetails:Start”screenwaitforconfirmationthattheagentstartedsuccessfully.ClickClose.
Starting the Flume Source Agent
Hereyoustarttheagentthatreadsthesourcelogfilesandpassesthedataalongtothecollectoragentyouhavealreadystarted.
1. Startthetail1FlumeAgentonelephant.
FromtheClouderaManagerFlumeInstancestab,selecttheAgenthostedonelephant.
Fromthe“ActionsforSelected”menuchooseStart.Inthe“Start”windowthatappears,click“Start”.
Inthe“CommandDetails:Start”screenwaitforconfirmationthattheagentstartedsuccessfully.ClickClose.
Viewing Data in HDFS
1. ConfirmthedataisbeingwrittenintoHDFS.
InClouderaManagerbrowsetotheHDFSpageforyourclusterandclickonFileBrowser.
Drilldowninto/user/flume/collector1.Youshouldseemanyaccess_logfiles.
61
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
61
2. ViewMetricDetails.
ReturntotheFlumepageandclickontheMetricDetailstab.HereyoucanseedetailsrelatedtotheChannels,Sinks,andSourcesofyourrunningFlumeagents.Ifyouareinterested,anexplanationofthemetricsavailableisathttp://bit.ly/flumemetrics.
Increase the File Size in HDFS (Optional)
Thesetwostepsareoptional,butmaybeofinteresttosomestudents.
1. EdittheCollector1agentconfigurationsettingsonhorsebyaddingthesethreeadditionalnamevaluepairs:
collector1.sinks.sink1.hdfs.rollSize = 2048
collector1.sinks.sink1.hdfs.rollCount = 100
collector1.sinks.sink1.hdfs.rollInterval = 60
ClickSaveChanges.
2. FromtheFlumeStatuspage,clickonthe StaleConfiguration-refreshneedediconandfollowthepromptstorefreshthecluster.
3. Executehdfs dfs -ls /user/flume/collector1inaterminalwindowandnotethefilesizeofthemorerecentcontentpostedbyFlumetoHDFS.
Viewing the Logs
1. Checkthelogfilestoseemessages.
InClouderaManagerchooseDiagnostics>Logs.
ClickSelectSourcesandconfigureasfollows:• UncheckallsourcesexceptFlume• SettheMinimumLogLeveltoINFO• Leavethetimeframeofyoursearchsetto30minutes
ClickSearch.
62
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
62
BrowsethroughtheloggedactionsfrombothFlumeagents.
Cleaning Up
1. StoptheloggeneratorbyhittingCtrl-Cinthefirstterminalwindow.
2. StopbothFlumeagentsfromtheFlumeInstancespageinClouderaManager.
3. Removethegeneratedaccesslogfilesfromthe/tmpdirectorytoclearupspaceonyourvirtualmachine.
Onelephant:
$ rm -rf /tmp/access_log*
This is the end of the Exercise.
63
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
63
Hands-On Exercise: Importing Data with Sqoop ForthisexerciseyouwillimportdatafromarelationaldatabaseusingSqoop.Thedatayouloadherewillbeusedinasubsequentexercise.
ConsidertheMySQLdatabasemovielens,derivedfromtheMovieLensprojectfromUniversityofMinnesota.(Seenoteattheendofthisexercise.)Thedatabaseconsistsofseveralrelatedtables,butwewillimportonlytwoofthese:movie,whichcontainsabout3,900movies;andmovierating,whichhasabout1,000,000ratingsofthosemovies.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
Performallstepsinthisexerciseonelephant.
Reviewing the Database Tables
First,reviewthedatabasetablestobeloadedintoHadoop.
1. LogontoMySQL.
Onelephant:
$ mysql --user=training --password=training movielens
2. Reviewthestructureandcontentsofthemovietable.
Onelephant:
64
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
64
mysql> DESCRIBE movie;
. . .
mysql> SELECT * FROM movie LIMIT 5;
3. Notethecolumnnamesforthetable.
____________________________________________________________________________________________
4. Reviewthestructureandcontentsofthemovieratingtable.
Onelephant:
mysql> DESCRIBE movierating;
. . .
mysql> SELECT * FROM movierating LIMIT 5;
5. Notethesecolumnnames.
____________________________________________________________________________________________
6. ExitMySQL.
Onelephant:
mysql> quit;
Adding the Sqoop 1 Client
1. AddtheSqoop1Clientgatewayonelephant.
FromtheHomepageinClouderaManager,clickthedownarrowiconnexttoCluster1andchoose“AddaService”.
Select Sqoop 1 Client andclickContinue.
Atthe“CustomRoleAssignments”page,clickontheSelecthostsboxandchoosetoaddtheGatewayonelephant.ClickContinue.
65
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
65
The“Progress”pageappears.Oncetheclientconfigurationhasbeendeployedsuccessfully,clickContinue.Atthe“Congratulations”screenclickFinish.
2. Usingsudo,createasymlinktotheMySQLJDBCdriver.
Onelephant:
$ sudo ln -s /usr/share/java/mysql-connector-java.jar \
/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10\
/lib/sqoop/lib/
Nowrunthecommandbelowtoconfirmthesymlinkwasproperlycreated.
Onelephant:
$ readlink -f /opt/cloudera/parcels/CDH-5.3.2-\
1.cdh5.3.2.p0.10/lib/sqoop/lib/mysql-connector-java.jar
Ifthesymlinkwasproperlydefined,thecommandshouldreturnthe/usr/share/java/mysql-connector-java.jarpath.
Importing with Sqoop
YouinvokeSqooponthecommandlinetoperformseveralcommands.Withityoucanconnecttoyourdatabaseservertolistthedatabases(schemas)towhichyouhaveaccess,andlistthetablesavailableforloading.Fordatabaseaccess,youprovideaconnectstringtoidentifytheserver,andyourusernameandpassword.
1. ShowthecommandsavailableinSqoop.
Onelephant:
$ sqoop help
YoucansafelyignorethewarningthatAccumulodoesnotexistsincethiscoursedoesnotuseAccumulo.
2. Listthedatabases(schemas)inyourdatabaseserver.
66
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
66
Onelephant:
$ sqoop list-databases \
--connect jdbc:mysql://localhost \
--username training --password training
(Note:Insteadofentering--password trainingonyourcommandline,youmayprefertoenter-P,andletSqooppromptyouforthepassword,whichisthennotvisiblewhenyoutypeit.)
3. Listthetablesinthemovielensdatabase.
Onelephant:
$ sqoop list-tables \
--connect jdbc:mysql://localhost/movielens \
--username training --password training
4. ImportthemovietableintoHadoop.
Onelephant:
$ sqoop import \
--connect jdbc:mysql://localhost/movielens \
--table movie --fields-terminated-by '\t' \
--username training --password training
The--fields-terminated-by '\t'optionseparatesthefieldsintheHDFSfilewiththetabcharacter,whichissometimesusefulifuserswillbeworkingwithHiveandPig.
Warningsthatpackagessuchashbase,hive-hcatalog,andaccumuloarenotinstalledareexpected.Itisnotaproblemthatthesepackagesarenotinstalledonyoursystem.
NoticehowtheINFOmessagesthatappearshowthataMapReducejobconsistingoffourmaptaskswascompleted.
67
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
67
5. Verifythatthecommandhasworked.
Onelephant:
$ hdfs dfs -ls movie
$ hdfs dfs -tail movie/part-m-00000
6. ImportthemovieratingtableintoHadoopusingthecommandinstep4asanexample.
Verifythatthemovieratingtablewasimportedusingthecommandinstep5asanexampleorbyusingtheClouderaManagerHDFSpage’sFileBrowser.
7. OptionallyobservetheresultsinClouderaManager’sYARNApplicationspage.
NavigatetotheYARNApplicationspage.
NoticethelasttwoYARNapplicationsthatran(movie.jarandmovierating.jar).
Explorethejobdetailsforeitherorbothofthesejobs.
This is the end of the Exercise
Note:
This exercise uses the MovieLens data set, or subsets thereof. This data is
freely available for academic purposes, and is used and distributed by
Cloudera with the express permission of the UMN GroupLens Research
Group. If you would like to use this data for your own research purposes,
you are free to do so, as long as you cite the GroupLens Research Group in
any resulting publications. If you would like to use this data for commercial
purposes, you must obtain explicit permission. You may find the full dataset,
as well as detailed license terms, at http://www.grouplens.org/node/73
68
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
68
Hands-On Exercise: Querying HDFS With Hive and Cloudera Impala Inthisexercise,youwilladdHiveandClouderaImpalaservicestoyourcluster,enablingyoutoquerydatastoredinHDFS.
YouwillstartbyaddingtheZooKeeperservicetoyourcluster.ZooKeeperisaprerequisiteforHiveServer2,whichyouwilldeploywhenyouaddtheHiveservice.
ThenyouwilladdtheHiveservice,includingaHiveMetastoreServerandHiveServer2,toyourHadoopcluster,andconfiguretheservice.
Next,youwilladdtheImpalaservicetoyourclusterandconfiguretheservice.
ThenyouwillpopulateHDFSwithdatafromthemovieratingtableandrunqueriesagainstitusingbothHiveandImpala.
69
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
69
Attheendofthisexercise,youshouldhavedaemonsdeployedonyourfivehostsasfollows(newservicesaddedinthisexercisearehighlightedinblue):
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
Adding the ZooKeeper Service
Inthistask,youwilladdaZooKeeperservicetoyourcluster.ArunningZooKeeperserviceisaprerequisiteformanyotherservicessoyouwilladdthisservicenow.Whenyouaddadditionalserviceslaterintheclass,youmaynoticetheExercise
70
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
70
instructionshaveyouselectZooKeeperaspartofthesetofdependenciesforthenewservicetouse.
1. FromtheClouderaManagerHomepage,selectthe‘AddaService’menuoptionfromthedrop-downmenutotherightofCluster1.
TheAddServiceWizardappears.
2. SelectZooKeeperandclickContinue.
TheCustomizeRoleAssignmentspageappears.
3. Specifythefollowinghostassignments:
• Server–elephant,horse,andtigerbutnotlionormonkey
ClickOKandthenclickContinue.
4. TheReviewChangespageappears.
ReviewthedefaultvaluesspecifiedonthispageandclickContinue.
5. ProgressmessagesappearontheProgresspage.
Whentheaddingoftheservicehascompleted,clickContinue.
6. TheCongratulationspageappears.
ClickFinish.
7. TheClouderaManagerHomepageappears.
Thezookeeperservicenowappearsinthelistofservices.
AhealthissueiconmayappearnexttothenewZooKeeperservicetemporarily,howeverthisshouldgoawaymomentarilyandthestatusshouldchangetoGoodHealth.
71
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
71
Note:youmaynoticethatthereisnowonemore‘Hosts’configurationissues,thistimeontigerthat-liketheonesthatappearedearlier-arerelatedtomemoryovercommitvalidationthresholds.Inatrueproductioncluster,youwouldwanttoaddressallconfigurationissues,howeveryoucansafelyignoretheseintheclassroomenvironment.
Adding the Hive Service to Your Cluster
Inthistask,youwilladdtheHiveservicetoyourclusterusingClouderaManager.
YouwillconfiguretheHiveMetastoretousetheMySQLmetastoredatabaseandtohaveaHiveServer2instance.
HiveandImpalacanbothmakeuseofasingle,commonHivemetastore.RecallthatyoucreatedafewdatabasesbyrunningaSQLscriptpriortoinstallingyourCDHcluster.Oneofthedatabasesyoucreatedisnamedmetastore,whichwillbeusedbyHiveandImpalaasacommonmetastoreforstoringtabledefinitions.
Attheendofthetask,youwillrunasimpleHivecommandtoverifythattheHiveservicehasbeenadded.
1. FromtheClouderaManagerHomepage,selectthe“AddaService”optionforCluster1.
TheAddServiceWizardappears.
2. SelectHiveandclickContinue.
The“Selectthesetofdependenciesforyournewservice”pageappears.
3. Selecttherowcontainingthehdfs,yarn,andzookeeperservices,thenclickContinue.
TheCustomizeRoleAssignmentspageappears.
4. Specifyhostassignmentsasfollows:
72
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
72
• Gateway–elephantonly• HiveMetastoreServer–elephantonly• WebHCatServer–Donotselectanyhosts• HiveServer2–elephantonly
VerifythatyouhaveselectedonlyelephantandnotanyadditionalhostsfortheGateway,HiveMetastoreServer,andHiveServer2roles.
ClickContinue.
5. The“DatabaseSetup”pageappears.
Specifyvaluesforthedatabaseasfollows:
• DatabaseHostName–lion • DatabaseType–MySQL• DatabaseName–metastore• UserName–hiveuser• Password–password
Click“TestConnection”andverifythatconnectiontotheMySQLdatabaseissuccessful.
ClickContinue.
6. TheReviewChangespageappears.
ReviewthedefaultvaluesspecifiedonthispageandclickContinue.
7. ProgressmessagesappearontheProgresspage.
Whentheaddingoftheservicehascompleted,clickContinue.
8. TheCongratulationspageappears.
ClickFinish.
9. TheHomepageappears.
Astatusindicatorshowsyouthatthehiveserviceisingoodhealth.
73
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
73
10. VerifythatyoucanrunaHivecommandfromtheBeelineshell.
Onelephant:
$ beeline -u jdbc:hive2://elephant:10000/default \
-n training
IntheBeelineshell,typethefollowingcommand:
> SHOW TABLES;
Notablesshouldappear,becauseyouhaven’tdefinedanyHivetablesyet,butyoushouldnotseeerrormessages.
11. ExittheBeelineshell.
> !quit
Adding the Impala Service
Inthistask,youwilladdtheImpalaservicetoyourcluster.
ClouderaManagerwillautomaticallyconfigureImpalatousetheHiveMetastoreservicethatyoucreatedearlierinthisexercise.
1. FromtheClouderaManagerHomepage,selecttheAddaServiceoptionforCluster1.
TheAddServiceWizardappears.
2. SelectImpalaandclickContinue.
The“Selectthesetofdependenciesforyournewservice”pageappears.
3. Selecttherowcontainingthehdfsandhiveservices,thenclickContinue.
TheCustomizeRoleAssignmentspageappears.
74
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
74
4. Specifyhostassignmentsasfollows:
• ImpalaCatalogServer–horse• ImpalaStateStore–horse• ImpalaDaemon–elephant,horse,monkeyandtiger
ClickContinue.
Note:WhenyouaddedtheHiveservice,youspecifiedelephantasaGatewayhost,whichcausedtheHiveclienttobeaddedonelephant.WithImpala,theImpalaclient—impala-shell—isautomaticallyaddedonallhostsrunningImpala.
5. TheReviewChangespageappears.
ReviewthedefaultvaluesspecifiedfortheImpalaDaemonScratchDirectoriesonthispageandclickContinue.
6. ProgressmessagesappearontheProgresspage.
Whentheaddingoftheservicehascompleted,clickContinue.
7. TheCongratulationspageappears.
ClickFinish.
8. RestarttheHDFSservice.
AfteraddingImpala,ontheClouderaManagerhomepageyouwillnoticethattheHDFSservicehasstaleconfigurationsasindicatedbythe iconthatappears.
Clickonthe“StaleConfiguration:Restartneeded”icon.The“ReviewChanges”pageappears.
Click“RestartCluster”thenclick“RestartNow”.WhentheactioncompletesclickFinish.
9. ConfirmtheImpalaservicesstarted.
BrowsetotheImpalapageforyourclusterandclickonInstances.
75
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
75
YoushouldseethattheImpalaCatalogServer,ImpalaDaemons,andImpalaStateStoreserviceshaveallstartedsuccessfully.
Running Hive and Impala Queries
Inthistask,youwilldefinethemovieratingtableinHiveandrunasimplequeryagainstthetable.ThenyouwillrunthesamequeryinImpalaandcompareperformance.
Note:YoualreadyimportedthemovieratingtableintoHDFSintheImportingDataWithSqoopexercise.
1. ReviewthemovieratingtabledataimportedintoHDFSduringtheSqoopexercise.
Onelephant:
$ hdfs dfs -cat movierating/part-m-00000 | head
2. StarttheBeelineshellandconnecttoHiveServer2.
Onelephant:
$ beeline -u jdbc:hive2://elephant:10000 -n training
3. DefinethemovieratingtableinHive.
> CREATE EXTERNAL TABLE movierating
> (userid INT, movieid STRING, rating TINYINT)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LOCATION '/user/training/movierating';
4. VerifythatyoucreatedthemovieratingtableintheHivemetastore.
Onelephant:
76
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
76
> SHOW TABLES;
Youshouldseeanentryforthemovieratingtable.
5. RunasimpleHivetestquerythatcountsthenumberofrowsinthemovieratings table.
Onelephant:
> SELECT COUNT(*) FROM movierating;
BrowsetoClouderaManager’sYARNApplicationspagetoviewtheMapReducejobthatwasrunwhenyouexecutedtheHivequery.IntheResultstab,makenoteoftheamountoftimethequerytakestoexecutewhenrunninginHive.
6. TerminatetheBeelineshell.
Onelephant:
> !quit
7. StarttheImpalashell.
Onelephant:
$ impala-shell
8. ConnecttotheImpalaCatalogServerrunningonhorse.
> CONNECT horse;
77
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
77
9. SinceyoudefinedanewtableafterstartingtheImpalaserveronhorse,youmustnowrefreshthatserver’scopyoftheHivemetadata.
> INVALIDATE METADATA;
10. InImpala,runthesamequeryagainstthemovieratingtablethatyouraninHive.
> SELECT COUNT(*) FROM movierating;
ComparetheamountoftimeittooktorunthequeryinImpalatotheamountoftimeittookinHive.
11. ExittheImpalashell.
! quit;
12. ExploreImpalaqueriesinClouderaManager
InClouderaManager,fromtheClustersmenu’sActivitiessection,chooseImpalaQueries.
Noticethatbothofthequeriesyouranfromtheimpala-shellappearintheResultspanel.
Forthe‘SELECT’querythatyouran,choose“QueryDetails”fromthedropdownmenuontheright.Browsethroughthequerydetailsnotingtheinformationaboutthequerythatisavailabletoyou.
This is the end of the Exercise.
78
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
78
Hands-On Exercise: Using Hue to Control Hadoop User Access Inthisexercise,youwillconfigureaHueenvironmentthatprovidesbusinessanalystswiththefollowingcapabilities:
• SubmittingPig,Hive,andImpalaqueries• ManagingdefinitionsintheHiveMetastore• BrowsingtheHDFSfilesystem• BrowsingYARNapplications
UserswillbeabletoaccesstheirenvironmentsbyusingaWebbrowser,eliminatingtheneedforadministratorstoinstallHadoopclientenvironmentsontheanalysts’systems.
79
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
79
Attheendofthisexercise,youshouldhavedaemonsdeployedonyourfivehostsasfollows(newdaemonsshowninblue):
TheHueserverwillbedeployedonmonkey.TheHttpFSandOozieserversonmonkeywillsupportseveralHueapplications.
80
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
80
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
Adding an HttpFS Role Instance to the hdfs Service
Inthistask,youwillusetheClouderaManagerwizardtoaddanHttpFSroleinstancetothehdfsservice.TheHttpFSroleinstancewillresideonmonkey.Afteraddingtheroleinstance,youwillrunacurlcommandfromthecommandlinetoverifythatHttpFSworkscorrectly.
1. InClouderaManager,navigatetotheHDFSInstancespage.
2. ClickAddRoleInstances.
The“AddRoleInstancestoHDFS”pageappears.
3. ForHttpFS,specifymonkey.
ClickContinue.
4. ThehdfsRoleInstancespagereappears.TheHttpFS(monkey)roleinstancenowappearsinthelistofroleinstances.
NoticethatthestatusforthisroleinstanceisStopped.
AstatusindicatorshowsyouthattheHDFSservicehasgoodhealth.
5. StarttheHttpFSroleinstance.
IntheHDFSInstancespage,checktheboxnexttoHttpFsandfromtheActionsforSelectedmenuchooseStart.
IntheStartdialogwindowclickStart.WhentheCommandDetailswindowshowsthatthecommandcompletedsuccessfullyclickClose.
81
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
81
6. ToverifyHttpFSoperation,runtheHttpFSLISTSTATUSoperationtoexaminethecontentinthe/user/trainingdirectoryinHDFS.
Onelephant:
$ ssh training@monkey netstat -tan | grep :14000
$ curl -s "http://monkey:14000/webhdfs/v1/\
user/training?op=LISTSTATUS&user.name=training" \
| python -m json.tool
Note:TheHttpFSRESTAPIreturnsJSONobjects.PipingtheJSONobjectstopython -m json.toolmakestheobjectseasiertoreadinstandardoutput.
Adding the Oozie Service
Inthistask,youwilladdtheOozieservicetoyourcluster.WithClouderaManager,OozieisaprerequisitesforaddingtheHueservice.Wewon’tusetheseservicesintheseexercises,butfeelfreetoexplorethemonyourownifyoulike.
YouwillconfiguretheOozieinstancetoresideonmonkeyandtheSolrinstancetorunontiger.
1. InClouderaManager,navigatetotheHomepage.
2. SelecttheAddaServiceoptionforCluster1.
TheAddServiceWizardappears.
3. SelectOozieandclickContinue.
The“Selectthesetofdependenciesforyournewservice”pageappears.
4. Selecttherowcontainingthehdfs,yarn,andzookeeperservices,thenclickContinue.
TheCustomizeRoleAssignmentspageappears.
82
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
82
5. Specifyhostassignmentsasfollows:
• OozieServer–monkey
ClickContinue.
6. TheReviewChangespageappears.
ReviewthedefaultvaluesspecifiedonthispageandclickContinue.
7. ProgressmessagesappearontheProgresspage.
Whentheaddingoftheservicehascompleted,clickContinue.
8. TheCongratulationspageappears.
ClickFinish.
9. TheHomepageappears.
Astatusindicatorshowsyouthattheoozieserviceisingoodhealth.
Adding the Hue Service
Inthistask,youwilladdaHueservicetoyourcluster,configuringtheHueinstancetorunonmonkey.
1. FromtheClouderaManagerHomepage,selecttheAddaServiceoptionforCluster1.
TheAddServiceWizardappears.
2. SelectHueandclickContinue.
The“Selectthesetofdependenciesforyournewservice”pageappears.
3. Selecttherowcontainingthehdfs,hive,impala,oozie,yarn,andzookeeperservices,thenclickContinue.
TheCustomizeRoleAssignmentspageappears.
83
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
83
4. Specifyhostassignmentsasfollows:
• HueServer–monkey
ClickContinue.
5. ProgressmessagesappearontheProgresspage.
Whentheaddingoftheservicehascompleted,clickContinue.
6. TheCongratulationspageappears.
ClickFinish.
7. TheHomepageappears.
Astatusindicatorshowsyouthatthehueserviceisingoodhealth.
8. SubmitaHadoopWordCountjobsothattherewillbeaMapReducejobentrythatyoucanbrowseinHueafteryoustarttheHueUI.
Onelephant:
$ hadoop jar /opt/cloudera/parcels/CDH-5.3.2-1.cdh\
5.3.2.p0.10/lib/hadoop-mapreduce/hadoop-mapreduce-\
examples.jar wordcount /tmp/shakespeare.txt test_output
9. InstallSparkclientconfigurationonelephant.
PreviouslyyourantheSparkshellonmonkey.HereyouwilladdtheSparkGatewayroleonelephantsothatyoucanruntheSparkshellfromelephant.
NavigatetotheSparkInstancespageforyourclusterandclickAddRoleInstances.
AddtheGatewayroletoelephant.
Clickonthe“Staleconfiguration-clientconfigurationredeploymentneeded”iconandinthe“Cluster1StaleConfigurations”pageclick“DeployClient
Configuration”.
84
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
84
IntheDeployClientConfigurationpageclick“DeployClientConfiguration”.
IntheProgressscreen,waitforthecommandstocomplete,thenclickFinish.
10. StarttheSparkShellsothattherewillbeaSparkjobentryinHue.
Onelephant:
$ spark-shell --master yarn-client
Leavethespark-shellopenintheterminalfortherestofthisexercise.
Exploring the Hue User Interface
Inthistask,youwilllogintotheHueUIasanadministrativeuserandbrieflyexplorethefollowingapplications:HueHomepage,HiveUI,ClouderaImpalaQueryUI,PigEditor,FileBrowser,MetastoreManager,JobBrowser,HueShell,UserAdmin,andHelp.
YouwillalsoexplorehowtheHueUIreportsmisconfigurationanddeterminewhyyoucannotusetheJobDesignerandOozieEditor/Dashboardapplications.
LoggingIntoHue
1. MaximizethebrowserwindowtogiveHueenoughspacetodisplayasmanyoptionsaspossibleonitstopmenu.
2. ViewtheHueUI.
Accessthe“HueWebUI”fromtheClouderaManagerHueStatuspageforyourclusterorjustbrowsetotheURLathttp://monkey:8888.
3. LogintoHue.
Note:thatasthemessageboxinthebrowserindicates,asthefirstpersontologintothisHueservice,youareactuallydefiningtheHuesuperusercredentialsinthisstep.
85
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
85
Typeinadminastheuser,withthepasswordtraining,thenclick“CreateAccount”.
4. TheQuickStartWizarddisplays.
ClicktheHomeicon.
Atthe“Didyouknow?”dialog,click“Gotit,prof!”
The“Mydocuments”pageappears.
AccessHiveUsingHue
1. IfyoucompletedtheQueryingHDFSWithHiveandClouderaImpalaexercise,starttheHiveQueryEditorbyselectingQueryEditors>Hive.
EnterthefollowingquerytoverifythatHiveisworkinginHue:
SHOW TABLES;
ClickExecute.Theresultofthequeryshouldbethemovieratingtable.
Enteranotherquerytocountthenumberofrecordsinthemovieratingtable:
SELECT COUNT(*) FROM movierating;
Giveitsometimetocomplete.TheUIwillfirstshowtheLogtabcontents,thenitshouldeventuallyshowtheResultstab.Thequeryshouldrunsuccessfully.
86
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
86
AccessImpalaUsingHue
1. IfyoucompletedtheQueryingHDFSWithHiveandClouderaImpalaexercise,starttheImpalaQueryEditorbyselectingQueryEditors>Impala.
EnterthefollowingquerytoverifythatImpalaisworkinginHue:
SHOW TABLES;
ClickExecute.Theresultofthequeryshouldbethemovieratingtable.
Enteranotherquerytocountthenumberofrecordsinthemovieratingtable:
SELECT COUNT(*) FROM movierating;
Thequeryshouldrunsuccessfully.
AccesstheMetastoreUsingHue
1. StarttheMetastoreManagerbyselectingDataBrowsers>MetastoreTables.
TheMetastoreManagerappearswithanentryforthemovieratingtable.
Selecttheentryforthemovieratingtable.
TheschemafortheHivemovieratingtable,whichyoucreatedintheHiveexercise,appearsintheMetastoreManager.
NoticethelistofactionsavailableintheACTIONSmenu.
AccessPigUsingHue
1. StartthePigUIbyselectingQueryEditors>Pig.
87
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
87
ThePigQueryEditorappears.
YoucaneditandsavePigscriptsusingHue’sPigQueryEditorinyourcurrentHuedeployment.
AccessHDFSUsingHue
1. ClickontheFileBrowser(documenticon)towardsthetoprightoftheHueUI.
ThisopenstheFileBrowserapplication.
BrowsetheHDFSfilesystem.Ifyouwish,executesomehdfs dfscommandsfromthecommandlinetoverifythatyouobtainthesameresultsfromthecommandlineandtheHueFileBrowser.
NotetheUploadmenuaswell.YoucoulduploadfilestoHDFSthroughHue.
TherearealsoActionsavailablegivingtheHueuseroptionstoRename,Move,DownloadorchangepermissionsonHDFSfiles(assumingtheuserhasthepermissionstodoso).
2. IntheFileBrowser,navigatetothe/user/admindirectory.
OnthefirstHuelogin,Huecreatedasuperuser–inthiscase,theadminuser–andanHDFSpathforthatuser–inthiscase,the/user/adminpath.
3. IntheFileBrowser,navigatetothe/user/training/test_output directory–theoutputdirectoryoftheWordCountjobthatyouranthebeforestartingtheHueUI.
4. Clicktheentryforthepart-r-00000file–theoutputfilefromtheWordCountjob.
Aread-onlyeditorshowingthecontentsofthepart-r-00000fileappears.
88
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
88
BrowseYARNApplicationsUsingHue
1. SelecttheJobBrowser(listicon)option.
AnentryfortheHivejobthatyouranearlierappearsintheHueJobBrowser.
SpecifytrainingintheUsernamefield.
AnentryfortheMapReducewordcountjobyouraninthepreviousstepappearswiththestatus“SUCCEEDED.”
AnotherentryfortheSparkshellyoushouldstillhaverunningappearswiththestatus“RUNNING.”
Browsethecompleted“wordcount”jobdetailsbyclickingonthelinkintheIDcolumnandthenlookingthroughthedetailsintheAttempts,Tasks,Metadata,andCounterstabs.
Ifyouareinterested,lookinClouderaManager’sYARNApplicationspageforyourcluster,locatetheentryforthesamewordcountjob,andfollowthe“ApplicationDetails”linkwhichtakesyoutotheHistoryServerWebUI.ComparethedetailsyoufindtherewiththeinformationavailableintheHueJobBrowser.
2. Backintheterminalwindowonelephant,typeexittoendtheSparkinteractiveshell.
BrowseUsers,Documentation,SettingsandLogsofHue
89
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
89
1. StarttheUserManagementToolbyselectingthe”admin”menu(cogandwheelsicon)andthenchoosing“ManageUsers.”
TheUserAdminscreenappearswhereyoucandefineHueusersandgroupsandsetpermissions.
Noticetheautomaticallycreatedentryfortheadminuser.
Youwillcreateanotheruserandagroupinthenexttask.
2. ClicktheDocumentation(questionmark)icon:
HueUIuserdocumentationappears.
3. ClicktheAboutHueicon(totheleftoftheHomeicon).
TheQuickStartWizard’sCheckConfigurationtabshows“AllOK.Configurationcheckpassed.”
ChoosetheAboutHuetop-levelmenuandclickintotheConfigurationtabtoexamineHue’sconfiguration.
ClicktheServerLogstabtoexamineHue’slogs.
Setting up the Hue Environment for Business Analysts
ConsiderascenariowhereyouhavebeengivenarequirementtosetupaHueenvironmentforbusinessanalysts.TheenvironmentwillallowanalyststosubmitHiveandImpalaqueries,editandsavePigqueries,browseHDFS,managetabledefinitionsintheHiveMetastore,andbrowseHadoopjobs.AnalystswhohavethisenvironmentwillnotneedHadoopinstalledontheirsystems.Instead,theywillaccessalltheHadoopfunctionalitythattheyneedthroughaWebbrowser.
YouwillusetheUserAdminapplicationtosetuptheanalysts’Hueenvironment.
1. VerifythatyouarestillloggedintoHueastheadminuser.
2. ActivatetheHueUserManagementtoolbyselectingadmin>ManageUsers.
3. SelectGroups.
90
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
90
4. ClickAddgroup,namethenewgroupanalysts.
Configurethepermissionsbyselectingtheoneslistedbelow:
• about.access • beeswax.access • filebrowser.access • hbase.access • help.access • impala.access • jobbrowser.access • metastore.write • metastore.access • pig.access
ClickAddgroup.
5. SelectUsers.
6. AddaHueusernamedfredwiththepasswordtraining.
Inthe“Step2:NamesandGroups”tab,makefredamemberoftheanalystsgroup.However,makesurethatfredisnotamemberofthedefaultgroup.
Click“AddUser”.
7. SignoutoftheHueUI(usingthearrowiconinthetoprightofthescreen).
8. LogbackintotheHueUIasuserfredwithpasswordtraining.
Verifythatinthesessionforfred,onlytheHueapplicationsconfiguredfortheanalystsgroupappear.Forexample,theAdministrationmenudoesnotallowfredtomanageusers.FredalsohasnoaccesstotheWorkflows,Search,andSecuritymenusthatareavailabletotheadminuser.
This is the end of the Exercise.
91
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
91
Hands-On Exercise: Configuring HDFS High Availability Inthisexercise,youwillreconfigureHDFS,eliminatingtheNameNodeasasinglepointoffailureforyourHadoopcluster.
Youwillstartbymodifyingthehdfsservice’sconfigurationtoenableHDFShighavailability.
Youwillthenshutdownservicesthatyouwillnolongeruseinthisexerciseorothersubsequentexercises.
Next,youwillenableautomaticfailoverfortheNameNode.AutomaticfailoverusestheZooKeeperservicethatyouaddedtoyourclusterinanearlierexercise.ZooKeeperisaprerequisiteforHDFSHAautomaticfailover.
OnceyouhaveenabledHDFShighavailabilitywithautomaticfailover,youwillintentionallybringoneoftheserversdownasatest.HDFSservicesshouldstillbeavailable,butitwillbeservedbyaNameNoderunningonadifferenthost.
92
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
92
Attheendofthisexercise,youshouldhavedaemonsdeployedandrunningonyourfivehostsasfollows(newdaemonsshowninblue):
93
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
93
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
Bringing Down Unneeded Services
SinceyouwillnolongeruseHue,Oozie,Impala,orHivefortheremainingexercises,youcanstoptheirservicestoimproveyourcluster’sperformance.
1. InClouderaManager,navigatetotheHomepage.
2. Stopthehueserviceasfollows:
Intherowforthehueservice,clickActions>Stop.
ClickStopintheconfirmationwindow.
ClickCloseaftermessagesintheCommandDetails:Stoppageindicatethatthehueservicehasstopped.
TheHomepagereappears.ThestatusofthehueserviceshouldhavechangedtoStopped.
3. Usingstepssimilartothestepsyoufollowedtostopthehueservice,stoptheoozieservice.
4. Stoptheimpalaservice.
5. Stopthehiveservice.
6. Stoptheflumeserviceifitisrunning.
94
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
94
Verifythattheonlyservicesthatarestillupandrunningonyourclusterarethehdfs,yarn,spark,zookeeper,andmgmtservices.Alloftheseservicesshouldhavegoodhealth.
Enabling HDFS High Availability
InthistaskyouwillconfigureyourHadoopconfigurationtouseHDFShighavailability.
1. ConfiguredirectoriesforJournalNodestostoreeditsdirectories.
Onelephant:
$ sudo mkdir /dfs/jn
$ sudo chown hdfs:hadoop /dfs/jn
$ ssh training@horse sudo mkdir /dfs/jn
$ ssh training@horse sudo chown hdfs:hadoop /dfs/jn
$ ssh training@tiger sudo mkdir /dfs/jn
$ ssh training@tiger sudo chown hdfs:hadoop /dfs/jn
2. InClouderaManager,browseHDFSConfigurationpageforyourcluster.
SelecttheService-Widecategory.
ForZooKeeperService,clickintheValueareaandchooseZooKeeper.
SaveChanges.
NowselecttheInstancestab.
ThehdfsRoleInstancespageappears.
Observethatthehdfsservicecomprisesthefollowingroleinstances:
95
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
95
• Balanceronhorse• DataNodesonelephant,monkey,horse,andtiger• TheactiveNameNodeonelephant• TheSecondaryNameNodeontiger• AnHttpFSserveronmonkey
Thelistofroleinstanceswillchangeafteryouenablehighavailability.
3. Click“EnableHighAvailability”.
4. The“GettingStarted”pageappears.
ChangetheNameserviceNametomycluster.
ClickContinue.
5. The“AssignRoles”pageappears.
Specifythefollowing:
• NameNodeHosts• elephant(Current) • tiger
• JournalNodeHosts
• elephant, horse, tiger
ClickContinue.
96
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
96
6. The“ReviewChanges”pageappears.
Specifythevalue/dfs/jninallthreeJournalNodeEditsDirectoryfields.
ClickContinue.
The“Progress”pageappears.ThemessagesshowninthescreenshotbelowappearasClouderaManagerenablesHDFShighavailability.
Note:FormattingthenamedirectoriesofthecurrentNameNodewillfail.AsdescribedintheClouderaManagerinterface,thisisexpected.
97
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
97
Whentheprocessofenablinghighavailabilityhasfinished,clickContinue.
98
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
98
Aninformationalmessageappearsinformingyouofpost-setupstepsregardingtheHiveMetastore.Youwillnotperformthepost-setupstepsbecauseyouwillnotbeusingHiveforanyremainingexercises.
ClickFinish.
7. ThehdfsRoleInstancespageappears.
Observethatthehdfsservicenowcomprisesthefollowingroleinstances:• Balanceronhorse• DataNodesonelephant,tiger,horse,andmonkey• FailoverControllersonelephantandtiger• AnHttpFSserveronmonkey• JournalNodesonelephant,tiger,andhorse• TheactiveNameNodeonelephant• ThestandbyNameNodeontiger• NoSecondaryNameNode
Verifying Automatic NameNode Failover
Inthistask,youwillrestarttheactiveNameNode,bringingitdownandthenup.ThestandbyNameNodewilltransitiontotheactivestatewhentheactiveNameNodegoesdown,andtheformerlyactiveNameNodewilltransitiontothestandbystate.
ThenyouwillrestartthenewactiveNameNodeinordertorestoretheoriginalstatesofthetwoNameNodes.
1. NavigatetotheHDFSservice’sInstancestab.
2. Inthe“FederationandHighAvailability”sectionofthepage,observethatthestateofoneoftheNameNodesisactiveandthestateoftheotherNameNodeisstandby.
3. Scrolldowntothe“RoleInstances”section.
4. IntheRoleInstancessectionofthepage,selectthecheckboxtotheleftoftheentryfortheactiveNameNode.
99
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
99
5. ClickActionsforSelected>Restart.
ClickRestarttoconfirmthatyouwanttorestarttheinstance.
6. Waitfortherestartoperationtocomplete.WhenithassuccessfullycompletedclickClose.
VerifythatthestatesoftheNameNodeshavechanged—theNameNodethatwasoriginallyactive(elephant)isnowthestandby,andtheNameNodethatwasoriginallythestandby(tiger)isnowactive.IftheClouderaManagerUIdoesnotimmediatelyreflectthischange,giveitafewsecondsanditwill.
100
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
100
7. GotoDiagnostics>EventsandnoticethemanyrecentevententriesrelatedtotherestartingoftheNameNode.
8. BackintheHDFSInstancestab,restarttheNameNodethatiscurrentlytheactiveNameNode.
Aftertherestarthascompleted,verifythatthestatesoftheNameNodeshaveagainchanged.
This is the end of the Exercise.
101
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
101
Hands-On Exercise: Using the Fair Scheduler Inthisexercise,youwillsubmitsomejobstotheclusterandobservethebehavioroftheFairScheduler.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
1. AdjustYARNmemorysettingandrestartthecluster.
InordertodemonstratetheFairScheduler,youwillneedtoincreasetheamountofmemorythatYARNcontainersarepermittedtouseonyourcluster.
InClouderaManager,gototheYARN(MR2Included)Configurationpage.
Inthesearchbox,searchforyarn.nodemanager.resource.memory-mb
Changethevalueofthisparameterto3GBinbothoftheRoleGroupswhereitappears.
102
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
102
SavetheChanges.
NavigatetotheYARNStatuspageandclickonthe“StaleConfiguration:RestartNeeded”icon.
Followthestepstorestartthecluster.Whentherestarthascompleted,clickFinish.
2. Analyzethescriptyouwillruninthisexercise.
TomakeiteasiertostartandstopMapReducejobsduringthisexercise,ascripthasbeenprovided.Viewthescripttogainanunderstandingofwhatitdoes.
On elephant
103
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
103
$ cd ~/training_materials/admin/scripts
$ cat pools.sh
Thescriptwillstartorstopajobinthepoolyouspecify.Ittakestwoparameters.Thefirstisthepoolnameandthesecondistheactiontotake(startorstop).Eachjobitwillrunwillberelativelylongrunningandconsistof10mappersand10reducers.
Note:Remember,weusethetermspoolandqueueinterchangeably.
3. StartthreeHadoopjobs,eachinadifferentpool.
Onelephant:
$ ./pools.sh pool1 start
$ ./pools.sh pool2 start
$ ./pools.sh pool3 start
Itisrecommendedthatyouattempttogothroughthisexercisebyonlystartingjobswhenpromptedintheinstructions.However,dependingonhowquicklyyoucompletethesteps,ajobmayhavecompletedearlierthantheinstructionsanticipated.Therefore,pleasenotethatatanytimeduringthisexerciseyoucanstartadditionaljobsinanypoolusinganyofthethreecommandsyouraninthisstep.
4. Verifythejobsstarted.
InClouderaManager,browsetoClusters>YARNApplications.
Youshouldnoticethatthethreejobshavestarted.Ifthejobsdonotyetdisplayinthepage,waitamomentandthenrefreshthepage.
Note:ClouderaManagerdoesrefreshpagesautomatically,howeverinthisexerciseyoumayfinditusefultorefreshthepagesmanuallytomorequicklyobservethelateststatusofpoolsandjobsrunninginthepools.
Leavethisbrowsertabopen.
104
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
104
5. Observethestatusofthepools.
Openanotherbrowsertab,andinClouderaManager,browsetoClusters>DynamicResourcePools.
Analyzethedetailsinthe“ResourcePoolsUsage”table.
Ifpool1,pool2,andpool3donotyetdisplay,refreshthebrowsertab.
Thetabledisplaysthepoolsinthecluster.Thepoolsyousubmittedjobstoshouldhavependingcontainersandallocatedcontainers.
Thetablealsoshowstheamountofmemoryandvcoresthathavebeenallocatedtoeachpool.
6. AnalyzethePerPoolcharts.
OnthesamepageinClouderaManager,noticethePerPoolSharescharts.ThereisonechartforFairShareMemoryandanotherforFairShareVCores.
Ifnothingisdisplayinginthesechartsyet,waitamomentandtheywill.Optionallyrefreshthebrowserpage.
Leavethisbrowsertabopen.
7. Startanotherjobinanotherpool.
Inthesameshellsessiononelephant:
$ ./pools.sh pool4 start
8. BackinClouderaManager,observetheresourceallocationaffectofstartinganewjobinanewpool.
OccasionallyrefreshtheDynamicResourcePoolspage.
Somepoolsmaybeinitiallyovertheirfairsharebecausethefirstjobstorunwilltakeallavailableclusterresources.
However,overtime,noticethatthejobsrunningoverfairsharebegintoshedresources,whicharereallocatedtootherpoolstoapproachfairshareallocationforallpools.
105
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
105
Tip:Mouseoveranyoneofthe“PerPool..”chartsandthenclickthedouble-arrowicontoexpandthechartsize.
9. Conductfurtherexperiments.
Stopthejobrunninginpool1.
$ ./pools.sh pool1 stop
Waitaminuteortwo,thenobservetheresultsinthechartsontheDynamicResourcePoolspage.
Startasecondjobinpool3.
$ ./pools.sh pool3 start
Againobservetheresults.
10. ConfigureaDynamicResourcePoolforpool2.
IntheDynamicResourcePoolspage,clickintotheConfigurationtabandclickontheResourcePoolstab.
Clickon“AddResourcePool”.
IntheGeneraltab,settheResourcePoolNametopool2.KeepDRFastheschedulingpolicy.
IntheYARNtab,configurethefollowingsettings:• Weight:2• VirtualCoresMin:1• VirtualCoresMax:2• MemoryMin:2400• MemoryMax:5000
ClickOKtosavethechanges.
106
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
106
11. ObservetheeffectofthenewDynamicResourcePoolontheresourceallocationswithinthecluster.
IntheYARNapplicationspage,checkhowmanyjobsarestillrunningandinwhichpoolstheyarerunning.
Usethepools.shscripttostartorstopjobssothatthereisonejobrunningineachofthefourpools.
ReturntotheDynamicResourcePoolspage’sStatustabtoobservetheaffectofthepoolsettingsyoudefined.
AsyoucontinuetoobservethePer-PoolShareschart,youshouldsoonseethatpool2isgivenagreatershareofresources.
12. Cleanup.
WhenyouaredoneobservingthebehavioroftheFairScheduler,stopallrunningjobseitherbyusingthepools.shscriptorkilltheapplicationsfromtheYARNApplicationspageinClouderaManager.
This is the end of the Exercise.
107
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
107
Hands-On Exercise: Breaking The Cluster Inthisexercise,youwillseewhathappensduringfailuresofportionsoftheHadoopcluster.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
1. VerifytheexistenceofalargefileinHDFS.
InapreviousexerciseyouplacedtheweblogfilesinHDFS.Verifythefilesarethere.
Onelephant:
$ hdfs dfs -ls weblog
Onlyifyoudonotseetheaccess_logfileinHDFS,placeittherenow.
Onelephant:
$ cd ~/training_materials/admin/data
$ hdfs dfs -mkdir weblog
$ gunzip -c access_log.gz \
| hdfs dfs -put - weblog/access_log
2. Locateablockthathasbeenreplicatedonelephantasfollows:
IntheNameNodeWebUI,navigatetothe/user/training/weblog/access_logfile.TheFileInformationwindowappears.
108
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
108
LocatetheAvailabilitysectionintheFileInformationwindowforBlock0.YoushouldseethreehostsonwhichBlock0isavailable.Ifoneofthereplicasisonelephant,noteitsBlockID.YouwillneedtorefertotheBlockIDinthenextexercise.
IfnoneofBlock0’sreplicasareonelephant,viewthereplicationinformationforotherblocksinthefileuntilyoulocateablockthathasbeenreplicatedonelephant.Onceyouhavelocatedablockthathasbeenreplicatedonelephant,noteitsblockID.
WewillrevisitthisblockwhentheNameNoderecognizesthatoneoftheDataNodesisa‘deadnode’(after10minutes).
3. Now,intentionallycauseafailureandobservewhathappens.
UsingClouderaManager,fromtheHDFSInstancespage,stoptheDataNoderunningonelephant.
4. VisittheNameNodeWebUIagainandclickon‘Datanodes’.Refreshthebrowserseveraltimesandnoticethatthe‘Lastcontact’valuefortheelephantDataNodekeepsincreasing.
5. RuntheHDFSfilesystemconsistencychecktoseethattheNameNodecurrentlythinkstherearenoproblems.
Onelephant:
$ sudo -u hdfs hdfs fsck /
6. Waitforatleasttenminutestopassbeforestartingthenextexercise.
(optional)inanyterminal:
$ sleep 600 && echo “10 minutes have passed.”
This is the end of the Exercise.
109
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
109
Hands-On Exercise: Verifying The Cluster’s Self-Healing Features Inthisexercise,youwillseewhathashappenedtothedataonthedeadDataNode.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
1. IntheNameNodeWebUI,clickonDatanodesandconfirmthatyounowhaveone‘deadnode.’
2. Viewthelocationoftheblockfromtheaccess_logfileyouinvestigatedinthepreviousexercise.NoticethatHadoophasautomaticallyre-replicatedthedatatoanotherhosttoretainthree-foldreplication.
3. InClouderaManager,chooseClusters>Reports.
Clickon“Customreport”atthebottomoftheDiskUsage(HDFS)section.
Buildareportwiththefollowingtwofilters(usetheplussigntoaddthesecondfilter):• Replication < 4 • Group equal to hadoop
Click“GenerateReport”.Youshouldseeallfilesmatchthecriteriasincethereplicationfactorforyourclusterissettothree.
ChangetheReplicationfilterfrom< 4to< 3andrungeneratethereportagain.
NowthereshouldbenofilesthatmatchthecriteriasincedatastoredonelephantwasreplicatedovertooneoftheotherthreeDataNodes.
4. Viewchartsthatshowwhenreplicationoccurred.
FromtheClouderaManagerHDFSpage,chooseCharts>ChartsLibrary.
110
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
110
Inthesearchbox,typeinreplication.Thiswillshowyouthe“PendingReplicationBlocks”and“ScheduledReplicationBlocks”charts.
NotethespikeinactivitythatoccurredaftertheDataNodewentdown.
5. ViewtheauditandlogtrailsinClouderaManager.
InClouderaManager,clickonAudits.
NotethetimestampforwhentheHDFSservicewasstopped.
ChooseDiagnostics>Logs,selectsourcesHDFSonly,settheMinimumLogLevelto“INFO”,andenterthesearchterm“replicate”.
ClickSearch.
Scrolltothebottom.Noticethelogmessagesrelatedblocksbeingreplicated.
6. Runthehdfs fsckcommandagaintoobservethatthefilesystemisstillhealthy.
Onelephant:
$ sudo -u hdfs hdfs fsck /
7. Runthehdfs dfsadmin -reportcommandtoseethatonedeadDataNodeisnowreported.
Onelephant:
111
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
111
$ sudo -u hdfs hdfs dfsadmin -report
8. UseClouderaManagertorestarttheDataNodeonelephant,bringingyourclusterbacktofullstrength.
9. Runthehdfs fsckcommandagaintoobservethetemporaryoverreplicationofblocks.
Onelephant:
$ sudo -u hdfs hdfs fsck /
Notethattheoverreplicationsituationwillresolveitself(ifithasnotalready)nowthatthepreviouslyunavailableDataNodeisonceagainrunning.
Ifthecommandabovedidnotshowanyoverreplicatedblocks,gotoDiagnostics>LogsinClouderaManagerandsearchtheHDFSsourcefor“ExcessRepl”.Youshouldfindevidenceofthetemporaryover-replicationinthelogentries.
This is the end of the Exercise.
112
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
112
Hands-On Exercise: Taking HDFS Snapshots Inthisexercise,youwillenableHDFSsnapshotsonadirectoryandthepracticerestoringdatafromasnapshot.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
1. EnablesnapshotsonadirectoryinHDFS.
InClouderaManager,gototheHDFSpageforyourclusterandclickFileBrowser.
Browseto/user/training,thenclickEnableSnapshots.
Inthe“EnableSnapshots”windows,keeptheSnapshottablePathsetto/user/trainingandclickEnableSnapshots.
Thecommandcompletes.Noticeinthemessagedisplayedonthe“Program:”linethatsnapshotscanalsobeenabledfromthecommandlineusingthehdfs dfsadmintool.
ClickClose.Noticethatthereisnowa“TakeSnapshot”button.
2. Takeasnapshot.
StillintheClouderaManagerFileBrowserat/user/training,Click“TakeSnapshot”.Giveitthenamesnap1andclickOK.
AfterthesnapshotcompletesclickClose.
Thesnapshotsectionshouldnowshowyour“snap1”listing.
3. Deletedatafrom/user/trainingthenrestoredatafromthesnapshot.
113
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
113
Nowlet’sseewhathappensifwedeletesomedata.
Onelephant:
$ hdfs dfs -rm -r weblog
$ hdfs dfs -ls /user/training
Thesecondcommandshouldshowthattheweblogdirectoryisnowgone.
Howeveryouweblogdataisstillavailable,whichyoucanseebyrunningthecommandshere:
$ hdfs dfs -ls /user/training/.snapshot/snap1
$ hdfs dfs -tail .snapshot/snap1/weblog/access_log
Restoreacopyoftheweblogdirectorytotheoriginallocationandthenverifyitisbackinplace.
$ hdfs dfs -cp .snapshot/snap1/weblog weblog
$ hdfs dfs -ls /user/training
This is the end of the Exercise.
114
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
114
Hands-On Exercise: Configuring Email Alerts Inthisexercise,youwillconfigureClouderaManagertouseanemailservertosendalerts.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
1. ConfigureClouderaManagertosendemailalertsusingtheemailserveronlion.
InClouderaManager,chooseClusters>ClouderaManagementService.
ClickonConfigurationandthenchoose“AlertPublisherDefaultGroup”.
Confirmthe“Alerts:EnableEmailAlerts”propertyischecked.
Configurethefollowing:• Alerts:MailServerUsername:training• Alerts:MailServerPassword:training• Alerts:MailMessageRecipients:training@localhost• Alerts:MailMessageFormat:text
Savethechanges.
2. RestarttheClouderaManagementService.
3. SendatestalertfromClouderaManager.
InClouderaManager,gotoAdministration>Alerts.Youshouldseethattherecipient(s)ofalertsisnowsettotraining@localhost.
Clickonthe“SendTestAlert”buttonatthetopofthepage.
4. ConfirmemailsarebeingreceivedfromClouderaManager.
115
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
115
Thepostfixemailserverisrunningonlion.Hereyouusethemailcommandlineclienttoaccessthetraininguser’sinbox.
Onlion:
The“TestAlert”emailshouldshowasunread(U).
Atthe&prompt,typeinthenumberthatappearstotherightoftheUandhitthe<Enter>keysoyoucanreadtheemail.
Afteryouaredonereadingtheemail,typeq<Enter>toexitthemailclient.
This is the end of the Exercise.
116
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
116
Troubleshooting Challenge: Heap O’ Trouble It’s8:30AMandyouareenjoyingyourfirstcupofcoffee.Keri,whoismakingthetransitionfromwritingRDBMSstoredprocedurestocodingJavaMapReduce,showsupinyourdoorwaybeforeyou’reevenhalfwaythroughthatfirstcup.
“IjusttriedtorunaMapReducejobandIgotanoutofmemoryexception.Iheardthattherewas32GBonthosenewmachinesyoubought.ButwhenIrunthisstupidjob,Ikeepgettingoutofmemoryerrors.Isn’t32GBenoughmemory?IfIdon’tfixthisthing,I’mgoingtobeinaheapoftrouble.ItoldmymanagerIwas99%completewithmyprojectbutnowI’mnotevensureifIcandowhatIwanttodowithHadoop.”
PutdownyourcoffeeandseeifyoucanhelpKerigetherjobrunning.
IMPORTANT:Thisexercisebuildsonthepreviousone.Ifyouwereunabletocompletethepreviousexerciseorthinkyoumayhavemadeamistake,runthefollowingcommandandfollowthepromptstoprepareforthisexercisebeforecontinuing:
$ ~/training_materials/admin/scripts/reset_cluster.sh
Recreating the Problem
1. ConfirmfilesinHDFS.
Onelephant:
$ hdfs dfs -ls /tmp/shakespeare.txt
Thecommandaboveshouldshowthatshakespeare.txtisinHDFS.
Onlyifshakespeare.txtwasnotfound,runthesecommandstoplacethefileinHDFS.
Onelephant:
117
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
117
$ cd ~/training_materials/admin/data
$ gunzip shakespeare.txt.gz
$ hdfs dfs -put shakespeare.txt /tmp
Onelephant:
$ hdfs dfs -ls weblog/access_log
Thecommandaboveshouldconfirmthattheaccess_logfileexists.
Onlyifaccess_logwasnotfound,runthesecommandstoplacethefileinHDFS.
Onelephant:
$ cd ~/training_materials/admin/data
$ hdfs dfs –mkdir weblog
$ gunzip -c access_log.gz \
| hdfs dfs -put – weblog/access_log
2. RuntheHeapofTroubleprogram.
onelephant:
$ cd ~/training_materials/admin/java
$ hadoop jar EvilJobs.jar HeapOfTrouble \
/tmp/shakespeare.txt heapOfTrouble
118
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
118
Attacking the Problem
TheprimarygoalofthisandalltheothertroubleshootingexercisesistostarttobecomemorecomfortableanalyzingproblemscenariosbyusingHadoop’slogfilesandWebUIs.Althoughyoumightbeabletodeterminethesourceoftheproblemandfixit,doingsosuccessfullyisnottheprimarygoalhere.
Takeasmanyactionsasyoucanthinkoftotroubleshootthisproblem.Pleasewritedowntheactionsthatyoutakewhileperformingthischallengesothatyoucansharethemwithothermembersoftheclasswhenyoudiscussthisexerciselater.
Fixtheproblemifyouareableto.
Donotturntothenextpageunlessyouarereadyforsomehints.
119
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
119
Some Questions to Ask While Troubleshooting a Problem
ThislistofquestionsprovidessomestepsthatyoucouldfollowwhiletroubleshootingaHadoopproblem.AllofthestepsdonotnecessarilyapplytoallHadoopissues,butthislistisagoodplacetostart.
• Whatistherethatisdifferentintheenvironmentthatwasnottherebefore
theproblemstartedoccurring?• Isthereapatterntothefailure?Isitrepeatable?• Ifaspecificjobseemstobethecauseoftheproblem,locatethetasklogsfor
thejob,includingtheApplicationMasterlogs,andreviewthem.Doesanythingstandout?
• ArethereanyunexpectedmessagesintheNameNode,ResourceManager,andNodeManagerlogs?
• Howisthehealthofyourcluster?• Isthereadequatediskspace?• Morespecifically,doesthe/var/logdirectoryhaveadequatedisk
space?• Mightthisbeaswappingissue?• Isnetworkutilizationextraordinarilyhigh?• IsCPUutilizationextraordinarilyhigh?
• Canyoucorrelatethiseventwithanyoftheissues?• IfitseemslikeaHadoopMapReducejobisthecauseoftheproblem,isit
possibletogetthesourcecodeforthejob?• DoessearchingtheWebfortheerrorprovideanyusefulhints?
Fixing the Problem
Ifyouhavetimeandareableto,fixtheproblemsothatKericanrunherjob.
Post-Exercise Discussion
Aftersometimehaspassed,yourinstructorwillaskyoutostoptroubleshootingandwillleadtheclassinadiscussionoftroubleshootingtechniques.
This is the end of the Exercise.
120
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
120
Appendix A: Setting up VMware Fusion on a Mac for the Cloud Training Environment WhenperformingtheHands-OnExercisesforthiscourse,youuseasmallCentOSvirtualmachinecalledGet2EC2.ThisVMisconfiguredtouseNATnetworking.YouconnecttoAmazonEC2instancesfromtheguestOSbystartingSSHsessions.TheGet2EC2VMissupportedforVMwareorVirtualBox.
VMwareFusion,likeotherhypervisors,runsaninternalDHCPserverforNAT-tedgueststhatassignsIPaddressestotheguests.Fromtimetotime,theinternalDHCPserverreleasesandrenewstheguests’leases.Unfortunately,theinternalDHCPserverinVMwareFusiondoesnotalwaysassignthesameIPaddresstoaguestthatithadpriortothereleaseandrenew,andtheGet2EC2VM’sIPaddresschanges.
ChangingtheIPaddressresultsinproblemsforactiveSSHsessions.Sometimestheterminalwindowinwhichtheclientisrunningwillfreezeup,becomingunresponsivetomouseandkeyboardinput,andnolongerdisplayingstandardoutput.Atothertimes,sessionswillbeshutdownwithaBrokenPipeerror.Ifthishappens,youwillhavetore-openanyfailedsessions.
IfyouareusingVMwareFusiononaMactoperformtheHands-OnExercisesforthiscourse,youneedtodecidewhetheryouwouldprefertotakeactionordonothing:
• IfyouhaveadministratorprivilegesonyourMac,youcanconfigureVMwareFusiontouseafixedIPaddress.TheinstructionsforconfiguringVMwareFusiontouseafixedIPaddressappearbelow.
• IfyouhaveVirtualBoxinstalled,youcanusetheVirtualBoxGet2EC2VMinsteadofVMwareFusion.
• Youcandonothing,inwhichcaseyoumightencounterterminalfreezesasdescribedabove.
ToconfigureVMwareFusiontouseafixedIPaddress,performthefollowingsteps:
121
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
121
1. StartVMwareFusion.
2. CreateanentryintheVirtualMachineslistfortheGet2EC2VM.Tocreatetheentry,dragtheCloudera-Training-Get2EC2-VM-1.0.vmxfiletotheVirtualMachineslist.
YoushouldseetheCloudera-Training-Get2EC2-VM-1.0entryintheVirtualMachineslist.WewillrefertotheCloudera-Training-Get2EC2-VM-1.0VMastheGet2EC2VM.
3. MakesuretheGet2EC2VMispowereddown.
4. ClickonceontheGet2EC2VMentryintheVirtualMachineslisttoselecttheVM.
Note:Ifyouaccidentallydouble-clicktheentry,youstarttheVM.Beforeyouproceedtothenextstep,powerdowntheVM.
5. ClicktheSettingsiconintheVMwareFusionToolbar(orselectVirtualMachines>Settings).
6. ClickNetworkAdapter.
7. ClickAdvancedOptions.
TheMACAddressfieldappears.
8. IftheMACAddressfieldisempty,clickGeneratetogenerateaMACaddressfortheGet2EC2VM.
9. CopytheMACaddressandpasteitintoafilewhereyoucanaccessitlater.YouwillneedtousetheMACaddressinasubsequentstep.
122
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
122
10. OpenthefollowingfileonyourMacusingsuperuser(sudo)privileges:
• VMwareFusion4andhigher:/Library/Preferences/VMware Fusion/vmnet8/dhcpd.conf
• VMwareFusion3:/Library/Application Support/VMware Fusion/vmnet8/dhcpd.conf
Lookfortherangestatement.ItshouldhavearangeofIPaddresses.Forexample:
range 172.16.73.128 172.16.73.254;
11. ChooseanIPaddressfortheGet2EC2VM.TheIPaddressshouldhavethefirstthreetuplesoftheIPaddressesintherangestatement,butthefourthtupleshouldbeoutsideoftheaddressesintherangestatement.Giventheexampleoftherangestatementinthepreviousstep,youwouldchooseanIPaddressthatstartswith172.16.73andendswithanumberlowerthan128(butnot0,1,or2–thosenumbersarereservedforotherpurposes).
Forexample,172.16.73.10.
12. Addfourlinestothebottomofthedhcpd.conffileasfollows:
host Get2EC2 {
hardware ethernet <MAC_Addresss>;
fixed-address <IP_Address>;
}
Replace<MAC_Address>withtheMACaddressyougeneratedinanearlierstep.
Replace<IP_Address>withtheIPaddressyouchoseinthepreviousstep.
BesuretoincludethesemicolonsaftertheMACandIPaddressesasshownintheexample.
13. Saveandclosethedhcpd.conffile.
123
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
123
14. RunthefollowingcommandsfromaterminalwindowonyourMac:
ForVMwareFusion4orhigher:
$ sudo /Applications/VMware\ Fusion.app/Contents/\
Library/vmnet-cli --stop
$ sudo /Applications/VMware\ Fusion.app/Contents/\
Library/vmnet-cli --start
ForVMwareFusion3:
$ sudo /Library/Application Support/VMware\ Fusion/\
boot.sh --restart
15. StarttheGet2EC2VM.
16. AftertheVMhascomeup,runthefollowingcommandinaLinuxterminalwindow:
$ ip addr
VerifythattheIPaddressthatappearsistheIPaddressthatyouspecifiedinthedhcpd.conffile.
This is the end of this Appendix.
124
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
124
Appendix B: Setting up VirtualBox for the Cloud Training Environment FollowthesestepstosetupVirtualBoxforthecloudtrainingenvironmentifyoudonotwanttoinstallVMwareFusiononyourMac.
VMwareFusionisourpreferredhypervisorforstudentsrunningthiscourseonMacOS.PleaseuseVMwareFusionifpossible.UseVirtualBoxforthiscourseonlyifitisyourpreferredvirtualizationenvironmentandifyouareknowledgeableenoughtobeself-sufficienttotroubleshootproblemsyoumightruninto.
Thissetupactivitycomprises:
• CreatingtheGetEC2VM
• PoweringuptheVM
• InstallingVirtualBoxGuestAdditionsontheVM
ThissetuprequiresVirtualBoxversion4orhigher.
1. Getthe.vmdkfilefortheclassfromyourinstructorandcopyitontothesystemonwhichyouwillbedoingtheHands-OnExercises.
2. StartVirtualBox.
3. SelectMachine>New.
TheNameandOperatingSystemdialogboxappears.
4. IntheNameandOperatingSystemdialogbox,specifyGet2EC2astheName,LinuxastheType,andRed Hat(notRedHat64-bit)astheVersion.ClickContinue.
TheMemorySizedialogboxappears.
5. IntheMemorySizedialogbox,acceptthedefaultmemorysizeof512MBandclickContinue.
125
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
125
TheHardDrivedialogboxappears.
6. IntheHardDrivedialogbox,select“Useanexistingvirtualharddrivefile.”
7. Inthefieldunderthe“Useanexistingvirtualharddrivefile”selection,navigatetothe.vmdkfilefortheclassandclickOpen.
8. ClickCreate.
TheOracleVMVirtualBoxManagerdialogboxappears.TheGet2EC2VMappearsontheleftsideofthedialogbox,withthestatusPoweredOff.
9. ClickStarttostarttheGet2EC2VM.
TheVMstartsup.Afterstartupiscomplete,theGNOMEinterfaceappears.Youareautomaticallyloggedinasthetraininguser.
NowyouarereadytoinstallVirtualBoxGuestAdditionsontheVM.
Note:TheversionofVirtualBoxandGuestAdditionsmustbethesame.YoumustinstallGuestAdditionsnowtoguaranteecompatibilitybetweenyourversionofVirtualBoxandGuestAdditions.
10. SelectDevices>InstallGuestAdditions.
Afterseveralseconds,adialogboxappearspromptingyoutoselecthowyouwanttoinstalltheversionofVBOXADDITIONSonyoursystem.VerifythatOpenAutorunPromptisselectedastheAction,thenclickOK.
11. AnotherdialogboxappearspromptingyoutoconfirmyouwanttoruntheGuestAdditionsinstaller.ClickRun.
12. TheAuthenticatedialogboxappears,promptingyoutoentertherootuser’spassword.SpecifytrainingandclickAuthenticate.
13. MessagesappearintheterminalwindowwhileVirtualBoxisbuildingandinstallingtheGuestAdditions.
Wheninstallationiscomplete,themessage,“PressReturntoclosethiswindow”appearsintheterminalwindow.
126
Copyright © 2010-2015 Cloudera, Inc. All rights reserved. Not to be reproduced without prior written consent from Cloudera.
126
14. PressReturn.
15. SelectSystem>“LogOuttraining”tologoutofyourGNOMEsession.
Afteryouhaveloggedout,youareautomaticallyloggedbackinasthetraininguser.
YouhavecompletedtheVirtualBoxsetup.Pleasereturntothenextstepin“ConfiguringNetworkingonYourCluster:CloudTrainingEnvironment”andcontinuethesetupactivityforthecloudtrainingenvironment.
This is the end of this Appendix.