Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
Key Concepts
This session is brought to you by Microsoft’s Analytics and Data Science Team.
1
Cortana Intelligence Suite Workshop Class Notebook
1. Understand how Azure Data Factory (ADF) fits into the Cortana Intelligence Suite
2. Understand the ADF logical flow
3. Create an ADF instance
4. An example of the ADF process
5. Understand and create the ADF components
Agenda
At the end of this Module, you will:
1. Understand how Azure Data Factory (ADF) fits into the Cortana Intelligence Suite
2. Understand the ADF logical flow
3. Create an ADF instance
2
Cortana Intelligence Suite Workshop Class Notebook
4. An example of the ADF process
5. Understand and create the ADF components
2
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
3
Cortana Intelligence Suite Workshop Class Notebook
Cortana Intelligence is a Platformand a Process to perform advanced analytics from start to finish
1. What you can do with CIS: https://www.microsoft.com/en-us/server-cloud/cortana-intelligence-suite/why-cortana-intelligence.aspx
2. More about the process: https://channel9.msdn.com/Blogs/Seth-Juarez/Understanding-Data-Science-for-building-Predictive-Analytics-Solutions-by-Francesca-Lazzeri
4
Cortana Intelligence Suite Workshop Class Notebook
For all of the technology that is available in Cortana Intelligence, they can be categorized into the following areas:
• Information management• Big data stores• Machine learning and analytics• Intelligence• Dashboards and visualization
Azure SQL Data Warehouse is categorized as a big data store. It is different to Data Lake in that it provides a relational big data store for structured data, but it does have the capability to interact with unstructured data as well.
5
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
6
Cortana Intelligence Suite Workshop Class Notebook
Azure Data Factory
Creates, orchestrates, & automates the movement, transformation and/or analysis of data through the cloud
1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/
2. Developer Reference: https://msdn.microsoft.com/en-us/library/azure/dn834987.aspx
7
Cortana Intelligence Suite Workshop Class Notebook
Azure Data Factory Logical Flow
1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/
2. Quick Example: http://azure.microsoft.com/blog/2015/04/24/azure-data-factory-update-simplified-sample-deployment/
8
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
9
Cortana Intelligence Suite Workshop Class Notebook
Create the Data Factory
AzurePortal
PowerShell
Visual Studio
ARM Templates
1. Setting Up: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
10
Cortana Intelligence Suite Workshop Class Notebook
Using the Portal
• Use in Non-MS Clients• Use for Exploration• Use when in demo/POC
1. Overview: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
2. Using the Portal: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/
11
Cortana Intelligence Suite Workshop Class Notebook
Using PowerShell
• Use in MS Clients
• Use for Automation
• Use for quick set up and tear down
1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/
2. Full Tutorial: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
12
Cortana Intelligence Suite Workshop Class Notebook
Using Visual Studio
• Use in mature dev environments• Use when integrated into larger development process
1. Overview: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
2. Using the Portal: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/
13
Cortana Intelligence Suite Workshop Class Notebook
Azure Resource Manager Templates
• Use in multiple environment
• Dev, Test, UAT and Production
• Works well where there are similar patterns
• ARM templates can be parameterized.
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-how-to-use-resource-manager-templates
14
Cortana Intelligence Suite Workshop Class Notebook
Create an ADF Instance
1. Open the ADF Student Workbook file from your \Resources folder
2. Follow the steps for Lab 1 to setup the lab environment
3. The follow the steps for Lab 2 to setup Azure Data Factory
4. Note – There’s a useful JSON prettifier here: http://www.jsoneditoronline.org/
15
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
16
Cortana Intelligence Suite Workshop Class Notebook
ADF Process
1. Define Architecture: Set up objectives and flow2. Create the Data Factory: Portal, PowerShell, VS3. Create Linked Services: Connections to Data and
Services4. Create Datasets: Input and Output5. Create Pipeline: Define Activities6. Monitor and Manage: Portal or PowerShell, Alerts
and Metrics
1. Full Tutorial: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
17
Cortana Intelligence Suite Workshop Class Notebook
Example - Churn
Call Log Files
Customer Table
Call Log Files
Customer Table
Customer Churn Table
Azure Data
Factory:
Data Sources
Customers Likely to Churn
Customer Call Details
Transform & Analyze PublishIngest
1. Video of this process: https://azure.microsoft.com/en-us/documentation/videos/azure-data-factory-102-analyzing-complex-churn-models-with-azure-data-factory/
18
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
19
Cortana Intelligence Suite Workshop Class Notebook
Azure Data Factory Components
1. ADF Components: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-introduction#relationship-between-data-factory-entities
20
Cortana Intelligence Suite Workshop Class Notebook
Linked ServicesCompute resource
Data transformation activity
Compute environment
Hive HDInsight [Hadoop]
Pig HDInsight [Hadoop]
MapReduce HDInsight [Hadoop]
Hadoop Streaming HDInsight [Hadoop]
Machine Learning activities: Batch Execution and Update Resource
Azure VM
Stored ProcedureAzure SQL, Azure SQL DW, or SQL Server
Data Lake Analytics U-SQL Azure Data Lake Analytics
DotNetHDInsight [Hadoop] or Azure Batch
Category Data storeSupported as a source
Supported as a sink
Azure Azure Blob storage ✓ ✓
Azure Data Lake Store
✓ ✓
Azure DocumentDB
✓ ✓
Azure SQL Database
✓ ✓
Azure SQL Data Warehouse
✓ ✓
Azure Search Index
✓
Azure Table storage
✓ ✓
Databases Amazon Redshift ✓
DB2 ✓
MySQL ✓
Oracle ✓ ✓
PostgreSQL ✓
SAP Business Warehouse
✓
SAP HANA ✓
SQL Server ✓ ✓
Sybase ✓
Teradata ✓
Other data sources are support. see the link in the notes for full details
Data Sources
AZURE SQL DATABASE EXAMPLE{"name": "AzureSqlLinkedService","properties": {"type": "AzureSqlDatabase","typeProperties": {"connectionString": "Server=tcp:ctosqldb.database.windows.net,1433;Database=EquityDB;User ID=ctesta-
oneill;Password=P@ssw0rd;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"}
}}
AZURE BLOB STORE EXAMPLE{"name": "StorageLinkedService","properties": {"type": "AzureStorage","typeProperties": {"connectionString":
"DefaultEndpointsProtocol=https;AccountName=ctostorageaccount;AccountKey=087ubp097guh8*JON*&B*(97g9879"}
}}
1. Linked Services: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-introduction#linked-services
21
Cortana Intelligence Suite Workshop Class Notebook
Datasets{
"name": "<name of dataset>",Dataset name
"properties": {
Properties"type": "<type of dataset: AzureBlob, AzureSql etc...>","external": <boolean flag to indicate external data. only for input datasets>,"linkedServiceName": "<Name of the linked service that refers to a data store.>",
Type
External
LinkedServiceName
"structure": [{
"name": "<Name of the column>","type": "<Name of the type>"
}],"typeProperties": {
"<type specific property>": "<value>","<type specific property 2>": "<value 2>",
},Structure
Name
Type
"availability": {"frequency": "<Specifies the time unit for data slice production. >","interval": "<Specifies the interval within the defined frequency.>"
},
Availability "policy":{ }
}}
Policy
AzureSqlLinkedService
StorageLinkedService
1. Datasets: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets
22
Cortana Intelligence Suite Workshop Class Notebook
Time Slicing Data"availability": {
"frequency": "<Specifies the time unit for data slice production. >","interval": "<Specifies the interval within the defined frequency.>"
},
Availability
Offset
"availability":{
"frequency": "Day","interval": 1,"offset": "06:00:00"
}
anchorDateTime
"availability": {
"frequency": "Hour", "interval": 23, "anchorDateTime":"2007-04-19T08:00:00"
}
{"name": "AzureBlobOutput",
"properties": {"published": false,"type": "AzureBlob","linkedServiceName":
"AzureStorageLinkedService","typeProperties": {"folderPath": "datacontainer/partitioneddata","format": {"type": "TextFormat","columnDelimiter": ","
}},"availability": {"frequency": "Month","interval": 1
}}
}
Style
"availability":{
"frequency": "Day","interval": 1,"offset": "06:00:00“"style": “EndOfInterval”
}
{"name": "AzureBlobInput",
"properties": {"published": false,"type": "AzureBlob","linkedServiceName": "StorageLinkedService","typeProperties": {"fileName": "input.log","folderPath": "datacontainer/inputdata","format": {"type": "TextFormat","columnDelimiter": ","
}},"availability": {"frequency": "Month","interval": 1
},"external": true,"policy": {}
}}
1. Time Slicing: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets
23
Cortana Intelligence Suite Workshop Class Notebook
Linked Services and Datasets
1. Open the ADF Student Workbook file from your \Resources folder
2. Follow the steps for Lab 1 to setup the lab environment
3. The follow the steps for Lab 2 to setup Azure Data Factory
4. Note – There’s a useful JSON prettifier here: http://www.jsoneditoronline.org/
24
Cortana Intelligence Suite Workshop Class Notebook
Activities
Data transformation activities
Data transformation activity
Compute environment
Hive HDInsight [Hadoop]
Pig HDInsight [Hadoop]
MapReduce HDInsight [Hadoop]
Hadoop Streaming HDInsight [Hadoop]
Machine Learning activities: Batch Execution and Update Resource
Azure VM
Stored ProcedureAzure SQL, Azure SQL DW, or SQL Server
Data Lake Analytics U-SQL Azure Data Lake Analytics
DotNetHDInsight [Hadoop] or Azure Batch
Data movement activities
{"name": "MyFirstPipeline","properties": {
"description": "My first Azure Data Factory pipeline","activities": [
{"type": "HDInsightHive","typeProperties": {
"scriptPath": "adfgetstarted/script/partitionweblogs.hql","scriptLinkedService": "StorageLinkedService","defines": {
"inputtable": "wasb://[email protected]/inputdata","partitionedtable": "wasb://[email protected]/partitioneddata"
}},"inputs": [
{"name": "AzureBlobInput"
}],"outputs": [
{"name": "AzureBlobOutput"
}],"policy": {
"concurrency": 1,"retry": 3
},"scheduler": {
"frequency": "Month","interval": 1
},"name": "RunSampleHiveActivity","linkedServiceName": "HDInsightOnDemandLinkedService"
}],"start": "2016-04-01T00:00:00Z","end": "2016-04-02T00:00:00Z","isPaused": false,"hubName": "ctogetstarteddf_hub","pipelineMode": "Scheduled"
}}
1. What is an activity: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-create-pipelines#what-is-an-activity
25
Cortana Intelligence Suite Workshop Class Notebook
Pipelines
Pipeline is a grouping of logically related activities.
Pipeline can be scheduled so the activities within it get executed.
Pipeline can be managed and monitored.
1. Pipelines: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-create-pipelines
26
Cortana Intelligence Suite Workshop Class Notebook
Activities and Pipelines
27
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
28
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
ADF orchestrates other tech to move, transform or analyze data
Broad range of options to create an ADF instance
Linked Services can point to data sources or compute resource
Datasets can be structures or unstructured
Activities can transform and analyse data sets
Pipelines are used to schedule and monitor ADF pipelines
Summary
In this session, you have learned:
• Scale-out distributed query engine• De-coupled storage from compute• Fully managed• Completely elastic• Platform as a Service (PaaS)• Petabyte scale• Leveraging cloud ecosystem• Broad range of connectivity options
29
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
Click on the graphics to explore more learning options from your Advanced Analytics and Data Science team, including:
• Online training
• Videos
• Instructor Led training
• Blogs
• Cortana Intelligence Gallery
30
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
31
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
For more information, see Microsoft Copyright Permissions at http://www.microsoft.com/permission
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.The Microsoft company name and Microsoft products mentioned herein may be either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
This document reflects current views and assumptions as of the date of development and is subject to change. Actual and future results and trends may differ materially from any forward-looking statements. Microsoft assumes no responsibility for errors or omissions in the materials.
THIS DOCUMENT IS FOR INFORMATIONAL AND TRAINING PURPOSES ONLY AND IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.
32