Date post: | 05-Aug-2015 |
Category: |
Software |
Upload: | venkatesh-narayanan |
View: | 151 times |
Download: | 2 times |
What is BigData
• Analyzing extremely large datasets computationally to reveal patterns, trends and associations.
• Characterized by 3Vs (Volume, Velocity and Variety).
• Enhanced insight and decision making.
Microsoft BigData solutions
• Microsoft supports Hadoop based BigData solutions.
• Built on top of Hortonworks Data Platform (HDP)
• Three distinct solutions based on HDP• HDInsight
• HDP for Windows
• Microsoft Analytics Platform
Hadoop
• Hadoop - Framework for solving bigdata problem by using scale-out “divide and conquer” approach
• HDFS – Hadoop Distributed File System. Allows data to be split across multiple nodes.
• MapReduce – Enables distributed processing.
Hadoop Components
• Cluster – Collection of server nodes, stores data using HDFS and process it.
• Datastore – Data store in each server is a distributed storage service (HDFS /Equivalent)
• Query – Big data processing queries using Map Reduce
HDInsight
• Implementation of Hadoop that runs on Azure Platform
• Pay only for what you use
• Dynamic allocation of Nodes in the cluster
• Integrated with Azure storage
HDInsight - Data Storage
• Following types of storage supported by HDInsight• HDFS (Standard Hadoop)
• Azure Storage Blob
• HBase
HDInsight – Data Processing
• Run jobs directly on the cluster using Map Reduce
• Use external programs to connect to the cluster.• Pig – Execute queries by writing scripts in high level language
• Hive – SQL like query on the data
• Mahout – ML library that allows to perform data mining queries
• Storm – Real time computation for processing fast, large streams of data
Designing for HDInsight
• Determine the analytical goals and source data
• Plan and configure the infrastructure
• Obtain data and submit it to HDInsight
• Process the data
• Evaluate the results
• Tune the solution
Azure DataLake
• Single place to store all structured and semi-structured data in native format
• Unlimited data size
• Compatible with HDFS