Consulting/Training
Josh LanePrincipal Architect, Wintellect
https://github.com/jplane/data-lake-webinar
An Introduction to Azure Data Lake
Consulting/Training
Principal Architect at WintellectConsulting, training, content development
Almost 20 years as software architect and developerFocused primarily on .NET, Node.js, and cloud
Microsoft Azure MVPAzure-in-the-ATL meetup [email protected]@jplane
whois Josh-Lane
Consulting/Training
consultingWintellect helps you build better software, faster, tackling the tough projects and solving the software and technology questions that help you transform your business. Architecture, Analysis and Design Full lifecycle software development Debugging and Performance tuning Database design and development
trainingWintellect's courses are written and taught by some of the biggest and most respected names in the Microsoft programming industry. Learn from the best. Access the same training
Microsoft’s developers enjoy Real world knowledge and solutions on both
current and cutting edge technologies Flexibility in training options – onsite, virtual,
on demand
Wintellect is the only company that offers the combined value of world class consulting services along with onsite, virtual and on-demand developer training. We help companies build better software, faster, helping you maximize and protect your consulting and training investments through ongoing knowledge transfer.
who we are
About Wintellect
Consulting/Training
What is a “data lake”?
“A single store of all data… ranging from raw data (which implies exact copy of source system data) to transformed data which is used for various forms including reporting, visualization, analytics and machine learning”
Consulting/Training
3 Pillars of Azure Data Lake
QueryVisualizationADLS
ADLA
HDInsight
Consulting/Training
Comprehensive, cloud-based big data storage and analytics platform
Purpose-built from real-world experiencesOffice 365, Skype, Bing, etc.
Leverage existing skills and technologiesBenefits of an Azure-hosted service
Elastic, dynamically provisioned compute resources for varying query needsInfinite storage capacityFocus on extracting meaning from data, not on infrastructure
What is Azure Data Lake?
Consulting/Training
HDFS-as-a-serviceDurable, redundant storageA variety of data scenarios
Unlimited capacityHigh-volume + low-latency (IoT, etc.)High throughput (massively parallel query)
Store data in its native formatstructured, semi-structured, unstructured storage formats
Data Lake Store
Consulting/Training
Data Lake Store – Importing Data
Consulting/Training
Managed, cloud-scale Apache Hadoop-as-a-serviceFull complement of Apache technologies
Spark, Storm, HBase, etc.Focus on queries and data, not infrastructurePay for only what you need and useLeverage existing skills and toolchains
Hive, Pig, Sqoop, R, etc.
HDInsight
Consulting/Training
Low-barrier alternative (or complement) to HDInsight and Hadoop ecosystem
Scales dynamically to match data size and query complexity
Built on Apache YARNUnit of interaction is an analytics job
Elastic infrastructure management is abstracted awayU-SQL… query language rooted in SQL and C#
Data Lake Analytics
Consulting/Training
Based on SQL and C#C# expressions and typesTables, views, window functions, etc.User-defined functions/operators/aggregators in C#
Typical job1. Read data from named file/table/federated source2. Transform rowset in an ordered pipeline3. Output rowset to named table or file
U-SQL
Consulting/Training
Data Lake Analytics – U-SQL, Federated Queries, Power BI integration
Consulting/Training
Azure Ecosystem Integration
Azure Data Lake
Federated SQL
Data Import
Visualization
Data Discovery
Security
Consulting/Training
Data Lake Store$0.04 per GB per month for storage$0.07 per 1 million transactions50% preview discount
Data Lake Analytics$0.017 per ”Analytics Unit” per minute$0.025 per completed job50% preview discount
HDInsight - https://azure.microsoft.com/en-us/pricing/details/hdinsight/
Pricing
Consulting/Training
https://azure.microsoft.com/en-us/services/data-lake-analytics/https://azure.microsoft.com/en-us/services/data-lake-store/https://azure.microsoft.com/en-us/services/hdinsight/http://usql.io/http://azure.github.io/AzureDataLake/
References
Consulting/Training
Thank You!