+ All Categories
Home > Documents > Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace...

Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace...

Date post: 14-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
31
Deploying Big Data Management ® 10.2.2 on the Microsoft Azure Cloud Platform through the Azure Marketplace © Copyright Informatica LLC 2019. Informatica, the Informatica logo, and Big Data Management are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html.
Transcript
Page 1: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Deploying Big Data Management®

10.2.2 on the Microsoft Azure Cloud

Platform through the Azure

Marketplace

© Copyright Informatica LLC 2019. Informatica, the Informatica logo, and Big Data Management are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html.

Page 2: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

AbstractCustomers of Microsoft Azure and Informatica can deploy Informatica Big Data Management® 10.2.2 through the Azure marketplace. The automated marketplace solution fully integrates Big Data Management with the Azure cloud platform and an Azure HDInsight or Databricks cluster. The installed solution includes several preconfigured mappings that you can use to discover the capabilities of Big Data Management to load, transform, and write data to various Azure storage resources.

Supported Versions• Big Data Management 10.2.2

Table of ContentsOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

The Big Data Management Solution on Azure Marketplace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Informatica Domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Informatica clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Azure Platform Elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Implementation Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Pre-Implementation Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Verify Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Gather Azure Resource Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Choose the Implementation Type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Provision the Big Data Management on Azure Marketplace Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Begin Provisioning the Big Data Management on Azure Marketplace Solution. . . . . . . . . . . . . . . . . . . 11

Deploy a Domain and Configure Azure Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Monitoring Instance Provision and Informatica Domain Creation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Post-Implementation Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Get the Informatica Administrator IP Address from the Azure Console. . . . . . . . . . . . . . . . . . . . . . . . 23

Download, Install, and Configure the Developer Client. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Configure Autodeployed Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Using the Pre-Installed Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Pre-Installed Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Ready To Go Scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Run the Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2

Page 3: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Next Steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

OverviewCustomers of Microsoft Azure and Informatica can deploy a big data solution that fully integrates Big Data Management with the Azure cloud platform.

Several different methods are available for deploying Big Data Management:

Hybrid deployment

Install and configure the Informatica domain and Big Data Management on-premises, and configure them to push processing to a compute cluster.

Manual cloud deployment

Manually install and configure the Informatica domain and Big Data Management on Azure cloud platform VMs in the same region as your HDInsight or Databricks compute cluster, or deploy the domain on-premises.

Marketplace cloud deployment

Execute a Big Data Management deployment from the Azure marketplace to create an Informatica domain and a compute cluster in the Azure cloud, exploring Big Data Management functionality through prepackaged mappings.

The Big Data Management marketplace solution on Azure enables you to automate the deployment of the Informatica domain, a compute cluster, and integration with storage and other resources in the Azure cloud platform. When you configure the solution, you make choices about clusters, storage, and other aspects of the deployed solution. The solution includes prepackaged mappings that demonstrate various Big Data Management functionality.

The following diagram shows the architecture of the Big Data Management on Azure marketplace solution:

The numbers in the architecture diagram correspond to items in the following list:

1. A resource group on the Azure platform.

3

Page 4: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

2. A virtual network, or vnet.

3. A subnet to contain the Big Data Management deployment.

4. A Network Security Group to manage access to the Big Data Management deployment, the compute cluster, and the SQL data warehouse.

5. The Informatica domain, including the Model Repository Service and the Data Integration Service.

6. Microsoft SQL server databases to act as Informatica domain repositories:

• Domain repository database

• Model repository

See “Informatica Domain” on page 4 for an explanation of each repository.

7. A compute cluster, one of the following:

• HDInsight

• Databricks

8. An SQL data warehouse, to act as a repository for data sources and targets.

9. Azure Data Lake storage (ADLS) or blob storage, to act as a repository for data sources and targets.

The Big Data Management Solution on Azure Marketplace

The solution includes fully configured Azure resources including ADLS and a SQL data warehouse for storage, a compute cluster for processing, and an Informatica domain populated with sample data and mappings.

Informatica DomainThe Informatica domain is a server component that hosts application services, such as the Model Repository Service and the Data Integration Service. These services, together with domain clients, enable you to create and run mappings and other objects to extract, transform, and write data.

Application Services

Model Repository Service

The Model Repository Service manages the Model repository. The Model repository stores metadata created by Informatica products in a relational database to enable collaboration among the products. Informatica Developer, the Data Integration Service, and the Administrator tool store metadata in the Model repository.

Data Integration Service

The Data Integration Service is an application service in the Informatica domain that performs data integration tasks for the Developer tool and for external clients.

The Informatica domain can run several other services. For more information about Informatica services, see the Informatica Application Service Guide.

Domain Repositories

Informatica repositories, posted on SQL databases, store metadata about domain objects. Informatica repositories include the following:

Domain configuration repository

The domain configuration repository stores configuration metadata about the Informatica domain. It also stores user privileges and permissions.

4

Page 5: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Model repository

The Model repository stores metadata for projects and folders and their contents, including all repository objects such as mappings and workflows.

For more information about domain repositories, see the Informatica Application Service Guide.

Informatica clientsYou can use several different clients with Informatica Big Data Management:

Administrator tool

The Administrator tool enables you to create and administer services, connections, and other domain objects.

Developer tool

The Developer tool enables you to create and run mappings and other objects that enable you to access, transform, and write data to targets.

Command line interface

The command line interface offers hundreds of commands to assist in administering the Informatica domain, creating and running repository objects, administering security features, and maintaining domain repositories.

Azure Platform ElementsThe Big Data Management solution automatically creates a virtual network on the Azure cloud platform with the following Azure platform elements:

• A network services gateway

• A compute cluster with varying choices of attached storage resources such as ADLS, WASB (general) storage, or an Azure SQL data warehouse.

You can choose from the following options when you configure the automated deployment to use HDInsight as the compute cluster:

• New WASB storage cluster, new Azure SQL database and new Azure SQL data warehouse

• Existing WASB cluster, new Azure SQL database and new Azure SQL data warehouse

• Existing WASB cluster without an SQL database and data warehouse

• New ADLS storage, new Azure SQL database and new Azure SQL data warehouse

• Auto-deploy an HDInsight cluster during the mapping run

You can choose from the following options when you configure the automated deployment to use Databricks as the compute cluster:

• Use an existing Databricks cluster

• Auto-deploy a Databricks cluster during the mapping run

When you choose the auto-deploy option for the compute cluster, you skip most configuration steps.

5

Page 6: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Implementation OverviewThe following diagram shows how the marketplace solution is implemented:

Pre-Implementation TasksBefore you start the automated Big Data Management deployment, perform the following pre-implementation tasks:

• Verify prerequisites.

• If you plan to use Azure Data Lake storage with the Big Data Management implementation, gather information about the service principal account.

Verify PrerequisitesVerify the following prerequisites:

• You have a Microsoft Azure subscription.

• You have access and permissions to create the following resources on the Azure platform:

- Virtual network (vnet)

- Network security group

- Storage resources

- Virtual machines

- HDInsight or Databricks cluster

- Azure SQL server database

- Azure data warehouse

6

Page 7: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

- Azure service principal, when you plan to use HDInsight for the compute cluster

• You have a valid Big Data Management license that you have downloaded to your local machine or a location on your network.

• You have created a sufficient number of CPU cores in the region where you plan to deploy the Big Data Management solution.The following table lists supported regions for Big Data Management with HDInsight:

Americas Europe Asia and Oceania

- Brazil South- Canada central- Canada East- Central US- East US- East US 2- North Central US- South Central US- West Central US- West US- West US 2

- Germany Central- Germany Northeast- North Europe- UK South- UK West- West Europe

- Australia East- Australia Southeast- Central India- East Asia- Japan East- Japan West- Korea central- Korea- South India- Southeast Asia

The following table lists supported regions for Big Data Management with Databricks:

Americas Europe Asia and Oceania

- Brazil South- Canada central- Canada East- Central US- East US- East US 2- North Central US- South Central US- West Central US- West US- West US 2

- North Europe- UK South- UK West- West Europe

- Australia East- Australia Southeast- Central India- East Asia- Japan East- Japan West- Korea central- Korea South- South India- Southeast Asia

In addition, the solution supports government cloud regions. If your desired region is not listed, contact Informatica support to check support for your desired region.

Note: Not all Azure resources are supported in all regions. See the Azure documentation to verify that the resources for your solution are supported in your desired region.

Gather Azure Resource InformationGather information about storage accounts and other information about existing resources.

The information you gather depends on whether you plan an HDInsight or Databricks cluster for the solution computer cluster.

7

Page 8: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Using HDInsight-Related Resources

If you plan to use an existing Azure Data Lake storage (ADLS) account with the HDInsight cluster, gather information about the ADLS service principal to allow you to configure access to the storage for solution elements. In addition, you will need to know other information about the ADLS account.

The service principal is an Azure user that meets the following requirements:

• Permissions to access required directories in ADLS storage.

• Certificate-based authentication for ADLS storage.

• Key-based authentication for ADLS storage.

The following table lists the information to gather and the location in the Azure portal where you can find it:

Property Description Where to Find

ADLS storage account name

Name of the ADLS account. 1. Select the Resource Groups tab.2. Select the resource group where the storage resides.3. Select the ADLS storage resource.4. Select the Access keys tab.5. Find the Storage Account Name property and copy the

value.

ADLS resource group Resource group that the ADLS account is a member of.

Select the Resource Groups tab and copy the name of the resource group that the ADLS is a member of.

ADLS Storage Account Key

An octet string that provides a key for the ADLS storage associated with the service principal.

1. Select the Resource Groups tab.2. Select the resource group where the storage resides.3. Select the ADLS storage resource.4. Select the Access keys tab.5. Find the Key property and copy the value.

Service Principal Object ID

An octet string that provides a key associated with the service principal.

1. Select the All Services tab.2. In the All services search box, type "enterprise."

The portal displays a list of resources whose names contain the word "enterprise."

3. Select the Enterprise Applications resource.4. Under Application Type, select All Applications, then

click Apply.5. Use the search bar if you know part of the Service

Principal name.6. Find the Service Principal and copy the value under

Object ID.

Service Principal Application ID

ID of the service principal user that represents the HDInsight cluster. Has permissions on the root folder of the ADLS storage account.

1. Select the Azure Active Directory tab.2. Click App Registrations.3. Search for and find the application name.4. Copy the value under Application ID.

Service Principal Certificate Content

The Base64 encoded text of the public certificate used with the service principal.The Azure administrator must generate this certificate content. For more information, see Azure documentation.

8

Page 9: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Property Description Where to Find

Subscription ID ID of the Azure account to use in the cluster creation process.

Open the Azure Portal and click the Overview tab to view the Subscription ID property.

Tenant ID A GUID string associated with the Azure Active Directory.

Select the Azure Active Directory tab and click Properties. Find the Directory ID property. The value of this property is also the tenant ID.

Client Secret An octet string that provides a key for an application associated with the Service Principal.

Attention: The key is visible only immediately after the administrator creates it. Perform the following steps immediately after key creation is complete. Subsequently, the key value is hidden and cannot be copied.1. Select the Azure Active Directory tab.2. Click App Registrations.3. Search for and find the service principal name.4. Click Settings.5. Click Keys.6. Create a key, and immediately copy its value.

Container Name Name of the container in which the WASB storage account resides.A container is a virtual collection of services that a platform developer can use to deploy applications.

1. Select the Resource Groups tab.2. Search for and select the resource group in which the

WASB storage resides.3. In the resource group properties, find and select the

WASB storage resource.4. In the storage account properties, copy the name. The

storage resource name and the container name are the same.

Tenant Authentication URI

URI that represents the authorization endpoint.

1. Select the Azure Active Directory tab.2. Click App Registrations.3. Click Endpoints.4. Copy the value of the OAUTH 2.0 Authorization Endpoint

property.

Data Lake Service Principal OAUTH Token Endpoint

Endpoint for OAUTH token based authentication.

1. Select the Azure Active Directory tab.2. Click App Registrations.3. Click Endpoints.4. Copy the value of the OAUTH 2.0 Token Endpoint

property.

9

Page 10: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Using an Existing Databricks Cluster

If you plan to use an existing Databricks cluster, gather the following information:

Property Description Where to Find

Databricks Cluster ID

Canonical identifier for the Databricks cluster

1. Open the Databricks workspace.2. Select Clusters in the left side pane.3. Select the existing cluster to use to run Big Data

Management mappings.4. In the Advanced Options area, click Tags.5. Copy the value of the Cluster ID.

Domain URL for Databricks

Endpoint identifier for token-based authentication.

1. Select the cluster to use to run Big Data Management mappings.

2. Copy the URL.3. Remove the string https:// from the URL.

Databricks Access Token

The token ID created within Databricks required for authentication.Note: If the token has an expiration date, verify that you get a new token from the Databricks administrator before it expires.

The Databricks administrator supplies this value.See the Azure documentation for steps to create the access token.

Choose the Implementation TypeYou can choose to create an entirely new implementation or create a new Informatica domain to use existing Azure resources.

Choose from among the available implementation types, as shown in the following image:

10

Page 11: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Provision the Big Data Management on Azure Marketplace SolutionWhen you provision the Big Data Management solution on Azure marketplace, you launch the wizard and configure basic properties. Then you go on to configure the solution.

Begin Provisioning the Big Data Management on Azure Marketplace SolutionUse the Azure Marketplace website to provision Azure cluster resources including a Big Data Management deployment.

When you implement the Big Data Management solution on Azure marketplace, you launch the wizard, configure basic properties, and then choose from among three implementation types.

1. Search for and select the Big Data Management 10.2.2 solution.

a. Log in to the Azure marketplace website. Use the search bar to search for Informatica Big Data Management.

b. Select Informatica Big Data Management 10.2.2 BYOL.

Click Get it now to launch the solution wizard.

c. Read the details of the terms of use and click Continue.

The wizard redirects the browser window to the Big Data Management 10.2.2 BYOL solution on the Azure portal.

d. Click the Create button at the bottom of the screen.

A series of panels opens to enable you to configure the solution on the Azure platform.

2. Supply information in the Basics panel, and then click OK.

Configure the following properties:

Property Description

Subscription Select the existing Azure subscription that you want to use for the deployment.

Resource group The resource group where you want to stage the deployment. You can create a new resource group or select an existing one.

Location Choose the location for the resource group. This should be a location where you have already the VM cores that you want to use for the deployment.

Go on to configure Informatica domain settings.

Deploy a Domain and Configure Azure ResourcesCreate an Informatica domain and configure new or existing Azure resources to use with it.

1. Supply information in the Informatica Domain Settings panel, and then click OK.

This tab allows you to configure details of the Informatica domain. All properties in this tab are mandatory.

11

Page 12: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Configure the following properties:

Property Description

Informatica domain administrator name

User ID for the Informatica domain administrator account.

Informatica domain password

Password for the Informatica domain administrator.After you type the password, retype it in the next field.

Informatica license file Click the Folder icon to browse to the location of the Informatica license file on your local system.When you select the license file and click OK, Azure uploads the file.

2. Supply information in the Node Settings panel, and then click OK.

This tab allows you to configure details of the virtual machines (VMs) that the automated implementation devotes to the solution.Configure the following properties:

Property Description

Machine prefix Type an alphanumeric string that will be a prefix on the name of each virtual machine in the Informatica domain.For example, if you use the prefix "infa" then Azure will identify virtual machines in the domain with this string at the beginning of the name.

VM Username Username that you use to log in to the virtual machine that hosts the Informatica domain.

Authentication type Authentication protocol you use to communicate with the Informatica domain. Choose from the following options to log in to the Informatica domain.:- Password. Choose this option to use a text password.- SSH Public Key. Choose this option to use an existing SSH public key.Default is Password.

Password Password to use to log in to the virtual machine that hosts the Informatica domain.After you type the password, retype it in the next field.

SSH Public Key Copy and paste an RSA public key in the single-line format (starting with “ssh-rsa“) or the multi-line PEM format. You can generate SSH keys using ssh-keygen on Linux and OS X, or PuTTYGen on Windows.

Machine size The machine size for the node that hosts the Informatica domain. Accept the default, or click Change size to configure another size for the node VM.

3. Supply information in the Database Settings panel, and then click OK.

This tab allows you to configure settings for the Informatica domain database, where Informatica stores metadata about domain objects and jobs.

12

Page 13: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Configure the following properties:

Property Description

Database machine name

Name for the virtual machine that hosts the domain database.

Database username

Username for the administrator of the virtual machine host of the database. The compute cluster uses these credentials to log into the virtual machine where the database is hosted.

Password Password for the database machine administrator.After you type the password, retype it in the next field.

Machine size The machine size for the database host VM. Accept the default, or click Change size to configure another size for the database host VM.

4. Supply information in the Informatica Big Data Management Settings panel, and then click OK.

This tab allows you to configure the compute cluster.

At the top of the panel, you choose between an HDInsight or Databricks cluster to run jobs on. The following image shows the controls for these choices:

If you choose HDInsight, you choose from the following options:

• New. The deployment creates a new HDInsight cluster.

• Existing. The deployment uses an existing HDInsight cluster.

• AutoDeploy. The deployment creates an ephemeral HDInsight cluster.

Note: If you choose this option, you must perform additional configuration tasks in the Administrator tool after the solution deployment is complete.

• Skip. Choose this option to deploy the solution without creating a compute cluster. You can create a compute cluster later when

If you choose Databricks, you choose from the following options:

• Existing. The deployment uses an existing Databricks cluster.

13

Page 14: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

• AutoDeploy. The deployment creates an ephemeral Databricks cluster.

Note: If you choose this option, you must perform additional configuration tasks in the Administrator tool after the solution deployment is complete.

• Skip. Choose this option to deploy the solution without creating a compute cluster. You can create a compute cluster later when

If you choose to create a new cluster, configure the following properties:

Configure the following properties in the Cluster Details area:

Property Description

HDInsight Cluster Name

Name of the HDInsight cluster where you want Informatica to process jobs.

HDInsight Cluster Username

User login for the cluster. This is usually the same login you use to log in to the Ambari cluster management tool.

Password Password for the HDInsight cluster user.After you type the password, retype it in the next field.

HDInsight Cluster SSH Username

Account name you use to log in to the cluster head node.

Password Password to access the cluster SSH host.After you type the password, retype it in the next field.

Configure the following properties in the Cluster Configuration area:

Property Description

Cluster Storage Type Choose from the following options for the primary storage type for data on the cluster:- Azure Storage- Data Lake Store

HeadNode Size The machine size for the HDInsight cluster head node. Accept the default, or click Change size to configure another size for the head node.

Cluster worker node count

Number of worker nodes in the cluster. The worker nodes support job processing. This number does not count the head and gateway nodes, which are created by default.

WorkerNode size The machine size for each HDInsight cluster worker node. Accept the default, or click Change size to configure another size for the worker nodes.

14

Page 15: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

If you choose to use an existing HDInsight cluster, configure the following properties in the Cluster Configuration area:

Property Description

Existing Cluster Resource Group

Resource group that the cluster is a member of.

HDInsight Cluster Name Name of the HDInsight cluster where you want Informatica to process jobs.

HDInsight Cluster Username User login for the cluster. This is usually the same login you use to log in to the Ambari cluster management tool.

Password Password for the HDInsight cluster user.After you type the password, retype it in the next field.

HDInsight Cluster SSH Host Name

Name of the HDInsight cluster SSH host.

HDInsight Cluster SSH Host Username

Account name you use to log in to the cluster SSH host.

Password Password to access the cluster SSH host.After you type the password, retype it in the next field.

Head Node Hostname of the HDInsight Cluster

Name of the cluster head node host.

HDInsight Cluster Port Port that the HDInsight cluster uses to listen for connections. Default is 8080.

If you choose to autodeploy an HDInsight cluster, configure the following properties in the Cluster Configuration area:

Property Description

Client ID * Client ID of the Azure Active Directory (AAD) tenant which has permission to deploy, access, read, write in a resource group, and delete an HDInsight cluster. The Client ID is the same as the Application ID.

Client Secret * The Client Secret of the Client ID.

* Make a note of the values for properties marked with an asterisk. You supply these values during later configuration steps if you chose to create autodeployed clusters.

15

Page 16: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

If you choose to use an existing Databricks cluster, configure the following properties in the Cluster Configuration area:

Property Description

Databricks Cluster ID Name of the existing Databricks workspace.

Databricks Access Token Secure authentication token for the Databricks workspace.Example: dapi0ddc437a91479a50ea50b21298ec9e23

Domain URL for Databricks Databricks cluster URL.Note: Do not include https://

After you configure the initial properties for a new cluster, configure storage properties in the Cluster Configuration section.

Note: If you choose to use an existing cluster for the solution, you use the existing storage that is already configured with the cluster and do not configure storage in this step.

At the top of the section, you choose between physical storage types for a new HDInsight cluster. Choose between the following types:Azure storage

General purpose storage is available in v.1 (GPv1) and v. 2 (GPv2). Both come in standard and premium versions. The standard storage version uses magnetic media tape storage. Only standard storage supports Hadoop.

Azure general storage is also known as WASB storage.

Use this disk storage for data sources and targets.

Data Lake Store

Azure Data Lake Storage provides massively scalable data storage optimized for Hadoop analytics engines. You can use ADLS to archive structured and unstructured data, and access it via Hive, Spark, or the native Informatica run-time engine.

After you choose the storage type, configure the properties for the physical storage choice.

Configure the following properties for Azure storage:

Property Description

Client ID * Client ID of the Azure Active Directory (AAD) tenant which has permission to deploy, access, read, write in a resource group, and delete an HDInsight cluster. The Client ID is the same as the Application ID.

Client Secret * The Client Secret of the Client ID.

* Make a note of the values for properties marked with an asterisk. You supply these values during later configuration steps if you chose to create autodeployed clusters.

16

Page 17: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Configure the following properties for Azure Data Lake storage:

Property Description

Data Lake Store Account * Identify the Data Lake Store account to use.

Data Lake Store Resource Group * Identify the resource group where the Data Lake Store resides.

Data Lake Root Folder Path to the Data Lake root folder.You can type any path, beginning with / and additional alphanumeric characters.

Cluster Identity Resource URI Path to the cluster identity resource.This path should start with https://

Tenant Authentication URI * Path for tenant authentication.This path should start with https://

AAD Tenant ID * ID of the Azure Active Directory (AAD) tenant.

Service Principal Object ID Specifies the unique application ID of the service principal. The service principal is the account used to access Data Lake Store data.

Service Principal Application ID * ID of the service principal user that represents the HDInsight cluster. Has permissions on the root folder of the ADLS storage account.

Service Principal Certificate Contents The Base64 encoded text of the public certificate used with the service principal.

Service Principal Certificate Password Private key for the service principal. The password enables the service principal to read and write to the Data Lake Store.This private key must be associated with the service principal certificate.

Client ID Client ID of the Azure Active Directory (AAD) tenant which has permission to deploy an HDInsight cluster. The Client ID is the same as the Application ID.

Client Secret The Client Secret of the Client ID.

* Make a note of the values for properties marked with an asterisk. You can use these property values later during configuration steps if you chose to create autodeployed clusters or when you “Editing and Running the Ready To Go Files” on page 28.

5. Supply information in the Create Additional Resources panel, and then click OK.

This panel enables you to choose whether to create resources or use existing ones, and to configure properties for the SQL Server and the SQL data warehouse.

Note: The SQL Server or data warehouse are not mandatory elements of the Informatica deployment. You can skip this step or use existing resources for source and target data. If you choose to create new resources, the deployment script creates connections and populates the resources with sample data.

The following image shows the control to use to make this choice:

17

Page 18: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

If you choose to create an SQL Server, you can also choose to create an SQL data warehouse, or skip creation of the data warehouse. If you choose to use an existing SQL database, you must also supply the name of an SQL data warehouse.

Configure the following properties for an SQL Server

Property Description

Server name Name of the SQL Server database server.

Server Admin Login Database server administrator account ID.

Password Password for the database server administrator. Retype the password in the following field.

Configure the following properties for an SQL data warehouse:

Property Description

Database name Name of the SQL data warehouse to create.

DW Database Edition

Type of the data warehouse to create. Accept the default value.Default: DataWarehouse.

Database Tier You can select from several available data warehouse tiers. See the Microsoft Azure website for tier pricing.Default: Gen1.

Database Requested Service Object Name

Size of the VM instance where the database resides.The default value of this property depends on your choice for the database tier. Recommendation: accept the default value of the property after you choose the database tier.

Collation Set the collation type for the data warehouse.Collations provide the locale, code page, sort order and character sensitivity rules for character-based data types. For more information and for a list and description of available collations, see Microsoft Azure documentation.Default: SQL_Latin1_General_CP1_CI_ASNote: When you configure an existing SQL data warehouse, this property is not needed.

6. Supply information in the Infrastructure Settings panel, and then click OK.

Use this tab to set up cluster resources for the Big Data Management implementation.

Note: If you created a storage account for the solution in a previous step, you can skip this step.

Storage account

Storage resource that the virtual machines that run the Big Data Management implementation will use for data storage.

18

Page 19: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Select from the existing storage accounts or create a new one.

To create a new storage account, click the right pointing arrowhead, as shown in the following image:

CIDR IP Address Range

The CIDR public IP address range permitted to access Informatica Server host.For example, 10.0.0.0/24 allows access to the Informatica Server host for all IP addresses within the range 10.0.0.0 through 10.0.0.255.

Default: *. The default value allows access to the Informatica domain host from all public IP addresses, but this is not recommended, for security reasons.

Cluster Virtual Network

Virtual network (vnet) that the compute cluster belongs to. Choose from the following options:

• If you plan to use an existing cluster, select the vnet that the existing cluster belongs to.

• If you chose to create a new cluster in the Informatica Big Data Management Configuration panel, you can create a new vnet in which the cluster can reside in this step.

If you plan to use an existing cluster, select the vnet that the existing cluster belongs to.

If you chose to create a new HDInsight cluster in the Informatica Big Data Management Configuration panel, you can create a new vnet in which the cluster can reside in this step.To create a new vnet, click the right pointing arrowhead, as shown in the following image:

Subnets

The subnet of the virtual network, inside which the solution creates all resources.

Click the right-facing arrowhead and choose from among the subnets that are available in the virtual network.

7. Verify the choices in the Summary panel, and then click OK.

8. Read the terms of use in the Create panel, and then click Create.

When you click Create, Azure deploys Big Data Management and creates resources in the environment that you configured.

19

Page 20: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Monitoring Instance Provision and Informatica Domain CreationYou can use cloud platform dashboards, logs, or other artifacts to see whether cluster creation succeeded and how to locate and identify the Informatica domain on the cloud platform.

During Deployment

After you finish configuring the solution and start the deployment process, the Azure dashboard indicates deployment status in the top right corner. The following image shows this indicator:

When you click on the "Deployment in progress..." link, the dashboard displays detailed status of the deployment job, including resources as they are created. The following image shows this display:

When Deployment is Complete

The automated deployment includes the following resources:

• Storage account

• Virtual network (vnet)

• Network security group

20

Page 21: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

• Virtual machine hosting databases

• HDInsight or Databricks cluster, if you have chosen to create one or to include an existing cluster

• Informatica domain

• SQL Server database, if selected

• SQL data warehouse, if selected

Perform these steps to use your Azure dashboard to verify the status of resource deployment:

1. Use the dashboard search bar to search for the resource group that contains the Big Data Management deployment.The dashboard displays the Overview view of the resource group, with resource deployment status as a clickable link in the upper right corner.

2. Click the resource deployment status link.The following image shows how the link appears:

When you click the deployment status link, a detail window opens listing the failed and successful deployments, as shown in the following image:

3. Click the Error details link for information about failed resource deployments.

4. Click the Overview tab to see a list of the resources in a resource group. The following image shows a portion of a resource group with a large number of resources:

21

Page 22: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

5. You can click column headings in the display to sort by name, type, or location of the resource.

LogsAfter the completion of Big Data Management deployment, consult logs to see the success or failure of solution element creation.

You can access the following logs on the VM that hosts the Informatica domain:

Installation log

Records the installation of the Informatica domain, Informatica services, and repositories. In case any of the installation tasks failed, you can check the log for the command that failed.

Filename: Informatica_<version>_Services_<timestamp>.logLocation: /home/Informatica

Azure extension operation logs

Records the installation of Azure resources and services.

Filename: extension.logLocation: /var/log/azure/Microsoft.OSCExtensions.CustomScriptForLinux/1.5.2.2/

Note: A subdirectory in the above path under /download/0/ contains sdtout and errout logs . The directory also contains the file convert.sh, which contains the script that was executed to install Azure resources and services.

Command execution log

This log records the following events:

• Creation of Informatica connections, cluster configurations, and services

• Population of the data warehouse and SQL databases

• Import of sample mappings to the Model repository. This is recorded in the Project importing section of the log.

22

Page 23: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

• Data Integration Service recycling to register all changes.

At the top of the log file is a summary section that lists automated tasks and their status. Beneath the summary section are detailed sections about each task. If any of the tasks failed complete successfully, you can look at the detailed section for the task to troubleshoot the task.

Filename: Oneclicksolution_results.log Location: /home/<User ID>/

Post-Implementation TasksAfter the marketplace solution is finished deploying, perform the tasks in this section.

Get the Informatica Administrator IP Address from the Azure ConsoleAfter deployment is complete, you can look in the Azure console for a property that allows you to access the Administrator tool.

The Administrator tool enables you to administer the Informatica domain, services, repository objects, users and roles, and all other aspects of the domain.

1. In the Azure console, select the resource group where you deployed Big Data Management.

2. Select the VM instance where the Data Integration Service is deployed.

The portal displays properties for the VM instance.

The following image shows an example of how the portal displays VM properties:

3. Move the mouse over the value of the Public IP Address property, then to the right.

A Click to Copy control appears. Click to copy the IP address.

4. Edit the /etc/hosts file of the local machine where your web browser resides. Add the IP address to the hosts file.

The file is located on Windows at the following path: C:\Windows\System32\drivers\etc\hostsThe resulting hosts file might look like this, with the Informatica domain entry on the last line:

# Copyright (c) 1993-2006 Microsoft Corp.# For example:#

23

Page 24: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

# 102.54.94.97 rhino.acme.com # source server# 38.25.63.10 x.acme.com # x client host# localhost name resolution is handle within DNS itself.# 127.0.0.1 localhost 10.20.30.40 hn0-hdiwas.4rfn4okocaerpji0yhys468e0g.ix.internal.cloud app.net

5. Copy the URL and use it to open the Administrator tool in a separate browser window. Append the default port 6008.

The value of this property is the URL for the Administrator tool. Example:

https://10.20.30.40:6008/Administrator/You can also use the VM name. For example:

https://<VM name>:6008/Administrator/Tip: Bookmark the Administrator tool URL to allow for quick future access.

Download, Install, and Configure the Developer ClientUse the Developer tool client to open, edit and run the preconfigured mappings that are installed with the Model repository, as well as mappings that you design.

You can install the developer tool client on the Azure cloud platform in the same vnet as the Informatica domain, or on another network on-premises.

1. Download the Developer tool installer from the download site.

a. Open the email message that you received from Informatica containing the Big Data Management license. The message also contains a URL that you can use to download the Developer tool installer from the Informatica download site.

b. Download the installation .zip file from the Informatica Electronic Software Download site to a directory on your machine and then extract the installer files.

2. Run the Developer tool installer.

3. If you install the Developer tool outside the domain vnet, you must add the domain IP address and VM name to the hosts file on the Windows machine where the Developer tool runs.

The Hosts file is used by the operating system to map human-friendly hostnames to numerical Internet Protocol (IP) addresses which identify and locate a host in an IP network.

If you install the Developer tool on the Data Integration Service domain VM in the Azure vnet, skip this step.

a. Log into the machine where the Developer tool is installed.

b. Open the hosts file for editing.

The hosts file is located at the following path: C:\Windows\System32\drivers\etc\hostsc. Add an entry to the hosts file for the domain IP address and fully qualified host name.

For example, the resulting hosts file might look like this, with the Informatica domain entry on the last line:

# Copyright (c) 1993-2006 Microsoft Corp.# For example:## 102.54.94.97 rhino.acme.com # source server# 38.25.63.10 x.acme.com # x client host# localhost name resolution is handle within DNS itself.# 127.0.0.1 localhost 10.20.30.40 hn0-hdiwas.4rfn4okocaerpji0yhys468e0g.ix.internal.cloud app.net

4. Launch the Developer tool.

The first time that the Developer tool launches, it displays the Welcome page.

24

Page 25: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

5. Click the icon in the upper right corner of the display to open the Developer tool Workbench.

The Workbench is the primary developer tool user interface.

6. Connect to the Informatica domain on the Azure platform.

a. Click Window > Preferences.

The Preferences dialog box appears.

b. Select Informatica > Domains.

c. Click Add.

The New Domain dialog box appears.

d. Enter the domain name, host name, and port number.

e. Click Finish, and then click OK.

Configure Autodeployed ClustersWhen you chose autodeployed HDInsight or Databricks clusters for the Big Data Management 10.2.2 solution, you perform the steps to configure and create these clusters after the solution is deployed.

Autodeployed clusters are ephemeral clusters that you create to run specific workflow tasks. The Data Integration Service creates autodeployed clusters using cluster workflows.

You use the Developer tool to create a cluster workflow and configure the clusters that it creates. When you run the cluster workflow, it creates the cluster, runs jobs on the cluster, and terminates the cluster when jobs are complete.

To create and use cluster workflows in your deployed solution, see the chapter on Cluster Workflows in the Big Data Management User Guide.

Using the Pre-Installed MappingsThe marketplace solution contains sample pre-configured mappings that you can use as templates for your own mappings.

This section lists and describes the mappings and contains instructions for how to run them.

Pre-Installed MappingsUse the Developer tool to open and run the pre-installed mappings that the automated deployment contains.

Browse the folders in the Informatica_BDM_Sample project to access the pre-installed mappings. The following image shows the pre-installed objects:

25

Page 26: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

The following table lists pre-installed sample mappings that you can use with Azure HDInsight:

Mapping Name Purpose

m_Ingest_Lines. m_Ingest_Orders

Demonstrates moving data from a Microsoft SQL database to a Blob target using a new or existing HDInsight cluster.

m_Ingest_Lines_1, m_Ingest_Orders_1

Demonstrates moving data from a Microsoft SQL database to a Blob target using an autodeployed HDInsight cluster.

26

Page 27: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Mapping Name Purpose

m_Ingest_Lines_2, m_Ingest_Orders_2

Demonstrates moving data from a Blob source to a Blob target using an autodeployed HDInsight cluster.

m_Ingest_Lines_3, m_Ingest_Orders_3

Demonstrates moving data from a Blob source to a Blob target using an existing Databricks cluster.

m_Proccess_Orders Demonstrates moving data from Blob (lineitem) and Blob(Orders) to a data warehouse target using a new or existing HDInsight cluster.

m_Proccess_Orders_1 Demonstrates moving data from Blob (lineitem) and Blob(Orders) to a data warehouse target using an autodeployed HDInsight cluster.

m_Proccess_Orders_2 Demonstrates moving data from Blob (lineitem) and Blob(Orders) to a data warehouse target using an autodeployed HDInsight cluster.

m_Proccess_Orders_3 Demonstrates moving data from Blob (lineitem) and Blob(Orders) to a data warehouse target using an existing Databricks cluster.

Ready To Go ScriptsUse a Ready To Go script to prepare the pre-installed mappings to run in your environment.

The automated deployment populates the /opt/Informatica/Archive/BDMFiles/ready_to_go/ directory on the Informatica domain machine with three sets of files. Each set is comprised of the following files:Input file

The input.properties file contains a series of properties that you populate with information about cluster storage resources, including authentication information, which the pre-installed mappings require to run successfully.

Script file

The script edits the pre-installed mappings with values from the input file.

It is not necessary to edit the script file.

Use the Ready To Go file set depending on your deployment type. The following table shows which Ready To Go file set to use for your environment:

Deployment Type Ready to Go Script File Name

Ready To Go Script Input File

New HDInsight cluster using the following options for attached storage:- new SQL DB or new SQL data

warehouse- no new SQL DB or new SQL data

warehouse

ready_to_go_new_wasb_cluster.sh

ready_to_go_new_wasb_cluster_input.properties

New HDInsight cluster using a new Azure Data Lake store and the following storage options:- new SQL DB or new SQL data

warehouse- no new SQL DB or new SQL data

warehouse

ready_to_go_adls.sh ready_to_go_adls_input.properties

27

Page 28: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Deployment Type Ready to Go Script File Name

Ready To Go Script Input File

Existing HDInsight cluster using the following options for attached storage:- new SQL DB or new SQL data

warehouse- no new SQL DB or new SQL data

warehouse

ready_to_go_existing_cluster.sh

ready_to_go_existing_cluster_input.properties

Autodeployed Databricks cluster databrickupdate.sh

Editing and Running the Ready To Go Files

To access and run the Ready To Go script files, perform the following steps:

1. On the Informatica domain machine, browse to the following directory: /opt/Informatica/Archive/BDMFiles/ready_to_go/

2. Open the input.properties file for your deployment type, and populate the following properties:

The values for each property are among the list of values in the topic “Gather Azure Resource Information” on page 7.Storage Account Name

Name of the storage account as it appears in the resource list.

Storage Account Key

Unique key of the storage resource.

Container name

Name of the container in which the WASB storage account resides.

Note: This value appears only in the following files:

• ready_to_go_adls_input.properties.

• ready_to_go_existing_cluster.sh. In this file, the container must be the one in which both the WASB storage and the HDInsight cluster reside.

ADLS Account Name

Name of the Azure Data Lake storage account as it appears in the resource list.

ADLS Application ID

ID string for the application.

ADLS Key

Unique key of the Azure Data Lake storage resource.

ADLS Token Endpoint

A URL representing the OAUTH 2.0 token endpoint of the Azure Data Lake storage resource.

To get this value, click on the App Registrations tab.

Note: The input.properties files contain sample values. Be sure to replace the sample values with real values that you obtain from the Azure portal.

3. To run mappings on an autodeployed HDInsight cluster:

a. In the mapping General properties, verify that the Hadoop connection in the workflow is selected.

28

Page 29: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

b. Select the region in the Azure Cluster Location property in the Advanced Properties of the Create Cluster task. Select the region where you deployed the resource group for the solution.

The following image shows the property to edit:

4. To use the Databricks cluster, update the installed Databricks connection by running the updateDatabricks.sh script.

1. Browse to the ready_to_go directory:

cd /opt/Informatica/Archive/BDMFiles/ready_to_go2. Run the updateDatabricks.sh script using the following command:

sh updateDatabricks.sh <Domain Administrator> <Domain Password> <Databricks URL> <Databricks access token>

where the command arguments are defined as follows:Domain Administrator

Informatica domain administrator ID

Domain Password

Password for the domain administrator

Databricks URL

Domain URL of the Databricks instance. Remove the string https:// from the URL.

Databricks Access Token

The token ID created within Databricks required for authentication.

For more information, see “Using an Existing Databricks Cluster” on page 10.

For example:

sh updateDatabricks.sh Informatica Informatica@123 southeastasia.azuredatabricks.net dapif8fe3c62efae4890ace2340c8085ac4e

5. To run mappings on an autodeployed Databricks cluster:

1. Run the Ready To Go script to update the Databricks connection with the access token and domain URL.

2. In the mapping General properties, verify that the Databricks connection in the workflow is selected.

3. In the Developer tool, create a cluster workflow and configure it to create en ephemeral Databricks cluster. Edit the Spark Configurations property in the advanced properties of the workflow Create Cluster task to add the following string:

spark.hadoop.fs.azure.account.key.<storage name>.blob.core.windows.net='<storage key>'

The following image shows the Spark Configurations property in the advanced cluster properties:

29

Page 30: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Run the MappingsEach of the included mappings is configured to use the Spark runtime engine. You must use the Spark run-time engine to run the pre-configured mappings.

To run a mapping, deploy the application that contains it:

1. In the Developer tool, browse to the application in the Applications folder.

2. Right-click the application and choose Deploy.

For more information about application deployment, see the Informatica Developer Tool Guide.

Ready to Go Script Logs

The following logs record Ready to Go script execution:

• ready_to_go_new_wasb_cluster.log

• ready_to_go_existing_cluster.log

• ready_to_go_adls.log

Location: home/<OS user name>/

Next StepsAfter you use the pre-installed mappings to learn how Big Data Management accesses and transforms data on the Azure platform, you can use the Developer tool to create and run your own mappings in the deployment environment that you created.

To learn more about how to use Big Data Management, read the Big Data Management documentation. Each of these guides is available in the Big Data Management documentation set on the Informatica Documentation Portal at https://docs.informatica.com.

Informatica Application Service Guide

Describes the Model Repository Service, Data Integration Service, and other application services that Big Data Management uses.

30

Page 31: Marketplace Platform through the A zure 10.2.2 on … Library/1/1295...The automated marketplace solution fully integrates Big Data Management with the A zure cloud platform and an

Big Data Management User Guide

Describes how to use Informatica Developer and Informatica Administrator to manage connections between the Informatica domain and the cluster, and how to create mappings in the Developer tool.

Informatica Developer Tool Guide

Contains full details about how to use the Developer tool to create and run mappings and workflows.

Informatica Developer Mapping Guide

Contains full details about how to develop mappings in the Developer tool.

Informatica Developer Mapping Transformation Guide

Contains details about each of the transformations that are available to use in mappings.

AuthorMark Pritchard

31


Recommended