Post on 28-Nov-2014
description
transcript
avkash@bigdataperspective.com
Lets Start and Define Big
Data
Lets Start and
Define Big Data
How Hadoop
Fits in this scenario
http://www.packtpub.com/using-cloudera-impala/book http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx
https://www.linkedin.com/in/avkashchauhan
Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage
& processing, distributed across machines.
Flexibility A Single Repo for
storing and analyzing any kind of data not bounded by schema
Scalability Scale-out architecture
divides workload across multiple nodes using flexible
distributed file system
Low Cost Deployed on commodity
hardware & open source platform
Fault Tolerant Continue working event if node(s) go
down
A system to move computation, where the data is.
Lets Start and Define Big Data
How Hadoop
Fits in this scenario
Hadoop Landscape
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core
Components
Data Storage
Data Processing
Hadoop Common
HDFS MapReduce
/YARN
Cloud
Lets Start and Define Big Data
How Hadoop Fits
in this scenario
Hadoop Landscape
Hadoop Core
Components
Applying Hadoop to Save $$
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Lets Start and Define Big Data
How Hadoop Fits
in this scenario
Hadoop Landscape
Hadoop Core
Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
EDW
OLAP
ODS
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
With Hadoop
Amazon HDInsight Directives Data Storage S3 Azure Blobs Direct access to compute
machine to super fast data delivery
Processing EC2
Azure Compute Dedicated Machines ready to turn with specific version of Hadoop runtime
Processing Libraries Java based or any other language supported through Hadoop Streaming
.Net based code User uploads their code processing binaries/ libraries
Results S3 Azure Blobs Once job is completed the results are stored back to specific data storage used as source
Visualization Custom Custom 3rd party application can connect to storage to perform visualization
Lets Start and Define Big Data
How Hadoop Fits in this scenario
Hadoop Landscape
Hadoop Core Components
Applying Hadoop to Save $$
Concept of Data Lake
Hadoop in Cloud
Big Data Analytics
With Hadoop
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-microsoft-azure-hdinsight.aspx