+ All Categories
Home > Software > Big data in Azure

Big data in Azure

Date post: 05-Aug-2015
Category:
Upload: venkatesh-narayanan
View: 151 times
Download: 2 times
Share this document with a friend
Popular Tags:
16
BigData in Azure Venkatesh
Transcript

BigData in AzureVenkatesh

Introduction to Azure

• Azure Cloud Service

• PaaS

• IaaS

What is BigData

• Analyzing extremely large datasets computationally to reveal patterns, trends and associations.

• Characterized by 3Vs (Volume, Velocity and Variety).

• Enhanced insight and decision making.

BigData vs Database

Microsoft BigData solutions

• Microsoft supports Hadoop based BigData solutions.

• Built on top of Hortonworks Data Platform (HDP)

• Three distinct solutions based on HDP• HDInsight

• HDP for Windows

• Microsoft Analytics Platform

Microsoft Data Platform

Hadoop

• Hadoop - Framework for solving bigdata problem by using scale-out “divide and conquer” approach

• HDFS – Hadoop Distributed File System. Allows data to be split across multiple nodes.

• MapReduce – Enables distributed processing.

Hadoop Components

• Cluster – Collection of server nodes, stores data using HDFS and process it.

• Datastore – Data store in each server is a distributed storage service (HDFS /Equivalent)

• Query – Big data processing queries using Map Reduce

HDInsight

• Implementation of Hadoop that runs on Azure Platform

• Pay only for what you use

• Dynamic allocation of Nodes in the cluster

• Integrated with Azure storage

HDInsight - Data Storage

• Following types of storage supported by HDInsight• HDFS (Standard Hadoop)

• Azure Storage Blob

• HBase

HDInsight – Data Processing

• Run jobs directly on the cluster using Map Reduce

• Use external programs to connect to the cluster.• Pig – Execute queries by writing scripts in high level language

• Hive – SQL like query on the data

• Mahout – ML library that allows to perform data mining queries

• Storm – Real time computation for processing fast, large streams of data

Data Loading Options

Designing for HDInsight

• Determine the analytical goals and source data

• Plan and configure the infrastructure

• Obtain data and submit it to HDInsight

• Process the data

• Evaluate the results

• Tune the solution

Azure DataLake

• Single place to store all structured and semi-structured data in native format

• Unlimited data size

• Compatible with HDFS

Creating HDInsight Cluster

Summary

• Hadoop – Defacto solution to the Big Data problem

• Windows Azure HDInsight Service• Native Hadoop implementation

• Managed Hadoop Service for Windows Azure


Recommended