+ All Categories
Home > Documents > Hadoop Institutes in Bangalore

Hadoop Institutes in Bangalore

Date post: 03-Sep-2015
Category:
Upload: kellytechnologies
View: 3 times
Download: 2 times
Share this document with a friend
Description:
Best Hadoop Institutes : kelly tecnologies is the best Hadoop training Institute in Bangalore.Providing hadoop courses by realtime faculty in Bangalore.
Popular Tags:
23
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies www.kellytechno.com
Transcript
  • Hadoop/MapReduce Computing Paradigm

    *CS525: Special Topics in DBsLarge-Scale Data Management

    Presented ByKelly Technologieswww.kellytechno.com

  • Large-Scale Data Analytics*MapReduce computing paradigm (E.g., Hadoop) vs. Traditional database systems

    vs.Many enterprises are turning to HadoopEspecially applications generating big dataWeb applications, social networks, scientific applications

    www.kellytechno.com

  • Why Hadoop is able to compete? *vs.www.kellytechno.com

  • What is Hadoop*Hadoop is a software framework for distributed processing of large datasets across large clusters of computersLarge datasets Terabytes or petabytes of dataLarge clusters hundreds or thousands of nodes Hadoop is open-source implementation for Google MapReduceHadoop is based on a simple programming model called MapReduceHadoop is based on a simple data model, any data will fit

    www.kellytechno.com

  • What is Hadoop (Contd)*Hadoop framework consists on two main layersDistributed file system (HDFS)Execution engine (MapReduce)

    www.kellytechno.com

  • Hadoop Master/Slave Architecture*Hadoop is designed as a master-slave shared-nothing architectureMaster node (single node)Many slave nodeswww.kellytechno.com

  • Design Principles of Hadoop*Need to process big data Need to parallelize computation across thousands of nodesCommodity hardwareLarge number of low-end cheap machines working in parallel to solve a computing problemThis is in contrast to Parallel DBsSmall number of high-end expensive machineswww.kellytechno.com

  • Design Principles of Hadoop*Automatic parallelization & distributionHidden from the end-user

    Fault tolerance and automatic recoveryNodes/tasks will fail and will recover automatically

    Clean and simple programming abstractionUsers only provide two functions map and reducewww.kellytechno.com

  • How Uses MapReduce/Hadoop*Google: Inventors of MapReduce computing paradigmYahoo: Developing Hadoop open-source of MapReduceIBM, Microsoft, OracleFacebook, Amazon, AOL, NetFlexMany others + universities and research labs

    www.kellytechno.com

  • Hadoop: How it Works*www.kellytechno.com

  • Hadoop Architecture*Master node (single node)Many slave nodesDistributed file system (HDFS)Execution engine (MapReduce)www.kellytechno.com

  • Hadoop Distributed File System (HDFS)*www.kellytechno.com

  • Main Properties of HDFS*Large: A HDFS instance may consist of thousands of server machines, each storing part of the file systems dataReplication: Each data block is replicated many times (default is 3)Failure: Failure is the norm rather than exceptionFault Tolerance: Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFSNamenode is consistently checking Datanodes www.kellytechno.com

  • Map-Reduce Execution Engine(Example: Color Count)*Input blocks on HDFSUsers only provide the Map and Reduce functionswww.kellytechno.com

  • Properties of MapReduce Engine*Job Tracker is the master node (runs with the namenode)Receives the users jobDecides on how many tasks will run (number of mappers)Decides on where to run each mapper (concept of locality)This file has 5 Blocks run 5 map tasks

    Where to run the task reading block 1Try to run it on Node 1 or Node 3Node 1Node 2Node 3www.kellytechno.com

  • Properties of MapReduce Engine (Contd)*Task Tracker is the slave node (runs on each datanode)Receives the task from Job TrackerRuns the task until completion (either map or reduce task)Always in communication with the Job Tracker reporting progressIn this example, 1 map-reduce job consists of 4 map tasks and 3 reduce taskswww.kellytechno.com

  • Key-Value Pairs *Mappers and Reducers are users code (provided functions)Just need to obey the Key-Value pairs interface Mappers:Consume pairsProduce pairsReducers:Consume Produce Shuffling and Sorting:Hidden phase between mappers and reducersGroups all similar keys from all mappers, sorts and passes them to a certain reducer in the form of www.kellytechno.com

  • MapReduce Phases*Deciding on what will be the key and what will be the value developers responsibilitywww.kellytechno.com

  • Example 1: Word Count*Job: Count the occurrences of each word in a data setMap TasksReduceTaskswww.kellytechno.com

  • Example 2: Color Count*Input blocks on HDFSJob: Count the number of each color in a data setwww.kellytechno.com

  • Example 3: Color Filter*Job: Select only the blue and the green colorsInput blocks on HDFSEach map task will select only the blue or green colors

    No need for reduce phasewww.kellytechno.com

  • Bigger Picture: Hadoop vs. Other Systems*Cloud ComputingA computing model where any computing infrastructure can run on the cloudHardware & Software are provided as remote servicesElastic: grows and shrinks based on the users demandExample: Amazon EC2 www.kellytechno.com

    Distributed DatabasesHadoopComputing ModelNotion of transactionsTransaction is the unit of workACID properties, Concurrency controlNotion of jobsJob is the unit of workNo concurrency controlData ModelStructured data with known schemaRead/Write modeAny data will fit in any format (un)(semi)structuredReadOnly modeCost ModelExpensive serversCheap commodity machines Fault ToleranceFailures are rareRecovery mechanismsFailures are common over thousands of machinesSimple yet efficient fault toleranceKey Characteristics- Efficiency, optimizations, fine-tuning- Scalability, flexibility, fault tolerance

  • *Presented ByKelly Technologieswww.kellytechno.com


Recommended