Copyright © Huawei Technologies Co., Ltd. 2018. All rights reserved.No part of this document may be reproduced or transmitted in any form or by any means without prior writtenconsent of Huawei Technologies Co., Ltd. Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei and thecustomer. All or part of the products, services and features described in this document may not be within thepurchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,and recommendations in this document are provided "AS IS" without warranties, guarantees orrepresentations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.
Huawei Technologies Co., Ltd.Address: Huawei Industrial Base
Bantian, LonggangShenzhen 518129People's Republic of China
Website: http://www.huawei.com
Email: [email protected]
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
i
Contents
1 Introduction.................................................................................................................................... 11.1 What Is DPS?..................................................................................................................................................................11.2 Application Scenarios.....................................................................................................................................................11.3 Functions........................................................................................................................................................................ 21.3.1 Pipeline Creation and Management.............................................................................................................................21.3.2 Pipeline Scheduling..................................................................................................................................................... 21.3.3 Pipeline Monitoring.....................................................................................................................................................21.3.4 Connector Creation and Management......................................................................................................................... 21.3.5 Resource Creation and Management...........................................................................................................................31.4 Related Services............................................................................................................................................................. 31.5 Permissions Required for Accessing DPS......................................................................................................................41.6 Restrictions..................................................................................................................................................................... 51.7 Basic Concepts............................................................................................................................................................... 5
2 Getting Started............................................................................................................................... 72.1 Using MRS and OBS to Process Data............................................................................................................................72.2 Using MRS, OBS, and RDS to Process Data............................................................................................................... 11
3 Installing DPS Agent.................................................................................................................. 163.1 Overview...................................................................................................................................................................... 163.1.1 Introduction to DPS Agent........................................................................................................................................ 163.1.2 Installation Flow........................................................................................................................................................ 163.2 Installation Preparation.................................................................................................................................................163.2.1 Purchasing Elastic Cloud Server (ECS).................................................................................................................... 173.2.2 Obtaining an AK/SK Pair.......................................................................................................................................... 173.2.3 Installing JRE............................................................................................................................................................ 183.2.4 Configuring hosts File...............................................................................................................................................193.3 Deploying DPS Agent.................................................................................................................................................. 203.3.1 Installing DPS Agent................................................................................................................................................. 203.3.2 Configuring DPS Agent.............................................................................................................................................213.3.3 Starting DPS Agent................................................................................................................................................... 233.3.4 Verifying DPS Agent................................................................................................................................................. 243.3.5 Stopping DPS Agent..................................................................................................................................................263.4 (Optional) Connecting to DWS Cluster........................................................................................................................26
Data Pipeline ServiceUser Guide Contents
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
ii
3.5 Common Operations.....................................................................................................................................................273.5.1 Binding EIP............................................................................................................................................................... 273.5.2 Unbinding EIP........................................................................................................................................................... 273.5.3 Configuring Security Group...................................................................................................................................... 283.5.4 Generating API Gateway Certificate.........................................................................................................................293.5.5 Using the WCC Tool to Encrypt Passwords..............................................................................................................293.5.6 Modifying the Run User and User Group of DPS Agent.......................................................................................... 303.5.7 Resetting the Password of API Gateway Certificate................................................................................................. 31
4 Working With DPS......................................................................................................................324.1 Pipeline Manager..........................................................................................................................................................324.1.1 Buying a Pipeline.......................................................................................................................................................324.1.2 Editing a Pipeline.......................................................................................................................................................334.1.3 Scheduling a Pipeline................................................................................................................................................ 364.1.4 Monitoring a Pipeline................................................................................................................................................ 384.1.5 Exporting a Pipeline.................................................................................................................................................. 404.1.6 Stopping a Pipeline....................................................................................................................................................414.1.7 Deleting a Pipeline.....................................................................................................................................................414.2 Connector List.............................................................................................................................................................. 424.2.1 Creating a DataSource Connector............................................................................................................................. 424.2.2 Creating a CDM Connector.......................................................................................................................................444.2.3 Creating an ESSource Connector.............................................................................................................................. 464.2.4 Editing a Connector................................................................................................................................................... 474.2.5 Deleting a Connector................................................................................................................................................. 484.3 Resource List................................................................................................................................................................ 484.3.1 Creating a DIS Resource........................................................................................................................................... 484.3.2 Creating an MRS Resource....................................................................................................................................... 514.3.3 Creating a CDM Resource.........................................................................................................................................534.3.4 Editing a Resource.....................................................................................................................................................554.3.5 Deleting a Resource...................................................................................................................................................56
5 Configuration Guide...................................................................................................................575.1 Data Sources................................................................................................................................................................. 575.1.1 RDS........................................................................................................................................................................... 575.1.2 HBase.........................................................................................................................................................................585.1.3 HDFS......................................................................................................................................................................... 595.1.4 OBS........................................................................................................................................................................... 605.1.5 DWS.......................................................................................................................................................................... 615.1.6 CDM Source.............................................................................................................................................................. 625.1.7 Dummy...................................................................................................................................................................... 635.1.8 UQuery Table.............................................................................................................................................................645.1.9 ES Storage................................................................................................................................................................. 655.2 Activities.......................................................................................................................................................................655.2.1 HDFS->HBASE........................................................................................................................................................ 65
Data Pipeline ServiceUser Guide Contents
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
iii
5.2.2 HDFS<->OBS........................................................................................................................................................... 695.2.3 Database<->HDFS.....................................................................................................................................................725.2.4 UQuery<->OBS.........................................................................................................................................................755.2.5 CDM Job....................................................................................................................................................................785.2.6 ExecuteCDM............................................................................................................................................................. 805.2.7 Spark..........................................................................................................................................................................825.2.8 SparkSQL.................................................................................................................................................................. 865.2.9 Hive........................................................................................................................................................................... 885.2.10 MapReduce..............................................................................................................................................................925.2.11 Shell Script...............................................................................................................................................................955.2.12 MachineLearning.....................................................................................................................................................975.2.13 Elasticsearch............................................................................................................................................................ 995.2.14 RDS SQL...............................................................................................................................................................1025.2.15 DWS SQL..............................................................................................................................................................1065.2.16 UQuery SQL..........................................................................................................................................................1085.2.17 Create OBS............................................................................................................................................................ 1105.2.18 Delete OBS............................................................................................................................................................ 112
6 FAQs.............................................................................................................................................1166.1 What Is DPS?..............................................................................................................................................................1166.2 Which Services Can DPS Schedule?.......................................................................................................................... 1166.3 How Many Pipelines Can I Create Using the DPS Console?.....................................................................................1166.4 What Can DPS Do?.................................................................................................................................................... 1176.5 What Is a Pipeline?..................................................................................................................................................... 1176.6 What Is a Data Source?...............................................................................................................................................117
A Change History......................................................................................................................... 118
Data Pipeline ServiceUser Guide Contents
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
iv
1 Introduction
1.1 What Is DPS?
Overview
Data Pipeline Service (DPS) is a web service running on the public cloud. It enables you toeasily automate the movement and transformation of data between different services.
With DPS, you can define a pipeline to describe data processing tasks, task executionsequences, and task scheduling plans. DPS then schedules and controls the execution of tasksbased on the pre-defined scheduling plan and relationship, to achieve inter-service dataprocessing and movement.
Highlightsl Visualized Orchestration
Pipelines defined in a drag-and-drop manner on a clear GUI, without requiring complexprogramming; template import and export; multiple data sources and data processingactivities
l Flexible SchedulingThree scheduling modes: periodic, event-driven, and manual; multiple executionpolicies, including precondition, failure policy, timeout, and retry; automatic operation ofpipelines
l Cost EffectivenessLow usage price; dynamic creation and release of compute and storage resources,minimizing the DPS expenses
l Solid ReliabilityUnified console for obtaining pipeline status in real time; automatic retry and recovery ofpipeline operation; automatic notification if an exception occurs
1.2 Application ScenariosDPS is applicable to the following scenarios:
Data Pipeline ServiceUser Guide 1 Introduction
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
1
l Data movement between servicesFor example, you have accumulated a certain amount data on a service that youpurchased, and want to transfer data between this service and other services. DPS sets upa data transmission channel between services and provides activities for concurrent datatransmission. This allows you to move data between services.
l Scheduled batch task executionDeep data analysis often requires a variety of complex tasks. However, DPS canschedule and run pipelines only through a few simple configurations.
1.3 Functions
1.3.1 Pipeline Creation and Managementl DPS provides a graphical pipeline editor. This allows you to orchestrate and edit data
sources and activities through drag-and-drop operations and build service-basedpipelines.
l DPS can integrate with various data sources, such as RDS, OBS, Hadoop distributed filesystem (HDFS), and HBase. For details, see Data Sources.
l DPS has a series of pre-packaged activities, enabling you to reliably process or movedata. For details, see Activities.
l DPS supports pipeline file import and export. It allows you to export pipeline files toyour local PC and import pipeline files to create or edit pipelines.
l DPS provides pre-defined templates. These pre-defined templates can be used to createpipelines quickly.
1.3.2 Pipeline Schedulingl To achieve efficient data processing, DPS supports two scheduling modes:
– Periodic scheduling: In a given period, DPS automatically runs the pipeline at aspecified interval (by month, week, day, hour, or minute).
– Manual scheduling: Manually trigger the running of a pipeline. In this schedulingmode, the pipeline is run for only once.
l During pipeline running, you can pause pipeline running or stop the schedule ofpipelines.
1.3.3 Pipeline MonitoringDPS allows you to view:
l Current and historical running details of pipelines.l Activity running details of each pipeline.
1.3.4 Connector Creation and ManagementDPS supports connector creation and management. With this function, you can directly use acreated and configured connector as a data source, eliminating the need of duplicate datasource configurations.
Data Pipeline ServiceUser Guide 1 Introduction
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
2
1.3.5 Resource Creation and ManagementDPS provides a resource list, which allows you to create and manage cloud service resources.Using this list, you can configure resource management and scheduling tasks to automaticallycreate and delete resources. This resource list facilitates the use of other cloud serviceresources.
1.4 Related ServicesDPS works with the following services:
l MapReduce ServiceBig data activities supported by DPS run on MapReduce Service (MRS).
l Object Storage ServiceObject Storage Service (OBS) stores data, including the input data and output data ofjobs.– Input data: user programs and data files.– Output data: result files and log files output by a job.
l Relational Database ServiceRelational Database Service (RDS) stores the input and output data of relationaldatabases and processes data.
l Elastic Cloud ServerElastic Cloud Server (ECS) is used to deploy DPS Agent. DPS schedules DPS Agentdeployed on the ECS to execute tasks.
l Key Management ServiceKey Management Service (KMS) is used to encrypt and decrypt passwords and privatekeys that DPS uses to connect to storage or compute resources.
l Data Warehouse ServiceData Warehouse Service (DWS) is used to store the input and output data of datawarehouses and process data.
l Data Ingestion ServiceDPS allows you to manage Data Ingestion Service (DIS). That is, you can create anddelete DIS streams on the DPS console.
l Cloud Data MigrationDPS uses Cloud Data Migration (CDM) to orchestrate and schedule cloud data.
l Machine Learning ServiceDPS uses Machine Learning Service (MLS) to implement data orchestration andscheduling related to machine learning.
l Unified Query ServiceUnified Query Service (UQuery) is a fully managed data query service. With autoscaling and standard SQL interfaces, UQuery enables you to easily explore and analyzeon-cloud data.
l Elasticsearch ServiceElasticsearch Service (ES) provides a distributed RESTful data search and analysisengine.
Data Pipeline ServiceUser Guide 1 Introduction
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
3
l Identity and Access ManagementIdentity and Access Management (IAM) authenticates access to DPS.
1.5 Permissions Required for Accessing DPS
BackgroundDPS uses access control lists (ACLs) to control users' permissions to data.
The MetaDB stores configurations of pipelines created by users as well as the ACLs ofpipelines. When a user attempts to retrieve a pipeline, DPS checks the user identityinformation to determine whether the user has permission to access this pipeline. This protectspipelines against unauthorized access and avoids information disclosure.
Permission listUser operation permission varies with the user groups to which the users belong.
Permission required for creating a user and creating or modifying a user group must be set onthe IAM console. For details, see Identity and Access Management User Guide.
Table 1-1 describes the permissions of different user groups.
Table 1-1 Permission list
NodeName
PermissionName
ManagedCloudResource
Description
Base TenantAdministrator
All services Permissions to operate all cloud resourcesowned by an enterprise.
Data Pipeline ServiceUser Guide 1 Introduction
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
4
NodeName
PermissionName
ManagedCloudResource
Description
DPS DPSAdministrator
Data PipelineService (DPS)
Users with the Tenant Administrator andDPS Administrator permissions canperform the following operations:l Create, delete, modify, and export
pipelines; query the pipeline list.l Run and stop pipelines; set the schedule
configurations for pipelines.l Create, delete, and modify connectors;
query the connector list.l Create, delete, and modify resources;
query the resource list.Users with only the DPS Administratorpermissions can perform the followingoperations:l Delete, modify, and export pipelines;
query the pipeline list.l Stop pipelines; set the schedule
configurations for pipelines.l Delete connectors; query the connector
list.l Query the resource list.
1.6 RestrictionsBefore using DPS, note the following restrictions to ensure that DPS runs properly:
l Recommended browsers for logging in to DPS:– Google Chrome 43.0 or later– Mozilla Firefox 38.0 or later– Internet Explorer 9.0 or later
Login to the DPS console through Internet Explorer 9.0 may fail. This is becausesome Windows operating systems (such as Windows 7 Ultimate) forbid the adminuser by default. You are advised to run browsers as the admin user.
l Do not delete existing processes or files on the DPS Agent node. Otherwise, the Agentwill become abnormal, affecting cluster and task running.
1.7 Basic Concepts
Regions and AZs
A region is a geographic area where resources used by your DPS services are located.
Data Pipeline ServiceUser Guide 1 Introduction
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
5
DPS services in the same region can communicate with each other over an intranet, but DPSservices in different regions cannot.
Public cloud data centers are deployed worldwide in places, such as North America, Europe,and Asia. Creating DPS services in different regions can better suit certain user requirements.For example, applications can be designed to meet user requirements in specific regions orcomply with local laws or regulations.
Each region contains many availability zones (AZs) where power and networks are physicallyisolated. AZs in the same region can communicate with each other over an intranet. Each AZprovides cost-effective and low-latency network connections that are unaffected by faults thatmay occur in other AZs.
ProjectA project is a collection of resources and the minimum unit for user authorization. Users'resources must be mounted to a project. DPS projects are used to isolate resources betweendifferent departments, different program teams, or different environments (such as R&D, test,and production environments) under the same program team.
Data Pipeline ServiceUser Guide 1 Introduction
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
6
2 Getting Started
2.1 Using MRS and OBS to Process DataThe procedure for using MRS and OBS to process data is as follows:
1. Scenario
2. Step 1: Logging In to DPS
3. Step 2: Creating Pipeline
4. Step 3: Configuring Pipeline
5. Step 4: Scheduling Pipeline
6. Step 5: Viewing Pipeline Running Information
Scenario
This section illustrates how to use DPS to transfer and process OBS data on the public cloudand saves the processed OBS data to the specified OBS bucket.
Figure 2-1 shows the data processing flow.
Figure 2-1 Data processing flow
Data transfer and processing flow:
1. The MapReduce activity of DPS transfers OBS data of the public cloud and theprograms developed by the user to MRS.
2. After MRS processes the data, it stores the processed data in a specified OBS bucket.
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
7
Logging In to DPS
Step 1 Log in to the management console.
Step 2 Choose All Services > EI Enterprise Intelligence > Data Pipeline Service. The DPSconsole is opened.
----End
Creating Pipeline
Step 1 Click in the upper left corner on the DPS console and select your region and project.
Step 2 On the Pipeline Manager page, click Buy Pipeline.
Figure 2-2 Buying a pipeline
Step 3 On the Specify Details page, configure the required parameters (as shown in Figure 2-3) andclick Buy Now.
Figure 2-3 Specifying service details
Step 4 On the Confirm Specifications page, confirm your order information, and click Next.
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
8
Figure 2-4 Confirming order information
Step 5 On the Pay page, select a payment mode and click OK.
After the pipeline is successfully bought, the system redirects you to the Pipeline Managerpage.
----End
Configuring Pipeline
Step 1 On the Pipeline Manager page, click Edit in the Operation column for the newly createdpipeline.
Step 2 Drag and drop two OBS data sources and one MapReduce activity to the edit grid area, andconnect them as shown in Figure 2-5.
Figure 2-5 Connecting the data sources and activity
Step 3 Click the data sources and activity one by one. On the configuration page that is displayed atthe right side of the edit grid area, configure the required parameters.l Data source: For details about how to configure the data source, see Data Sources.l Activity: For details about how to configure the activity, see Activities.
Step 4 Click . The system checks the parameter validity of the pipeline.
In the displayed dialog box with the message "Are you sure you want to save the pipeline?",click Yes. If the pipeline is valid, it is saved successfully.
----End
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
9
Scheduling Pipeline
Step 1 On the Pipeline Manager page, click Schedule in the Operation column for the newlycreated pipeline.
The Schedule Pipeline dialog box is displayed. Configure the pipeline schedule task asshown in Figure 2-6.
Figure 2-6 Configuring the pipeline schedule task
Step 2 Click OK.
Step 3 Click Run in the Operation column to start the schedule task for the pipeline.
----End
Viewing Pipeline Running InformationViewing Pipeline Running Status:
Step 1 On the Pipeline Manager page, click the name of the newly created pipeline. You can viewthe pipeline running information in the Running History area of the displayed page.
Step 2 Click to refresh the pipeline and activity running information.
NOTE
To view the running status of each activity in the pipeline, click at the left side of each runningrecord.
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
10
Figure 2-7 Viewing activity running status
----End
Viewing the Output OBS Data:
Step 1 Log in to OBS Browser.
For details, see Object Storage Service Browser Operation Guide.
Step 2 Go to the OBS bucket or directory that stores output data, and view detailed files.
----End
2.2 Using MRS, OBS, and RDS to Process DataThe procedure for using MRS, OBS, and RDS to process data is as follows:
1. Scenario2. Step 1: Logging In to DPS3. Step 2: Configuring Pipeline4. Step 3: Configuring Pipeline5. Step 4: Scheduling Pipeline6. Step 5: Viewing Pipeline Running Information
Scenario
This section illustrates how to use DPS to transfer and process OBS data on the public cloudand saves the processed OBS data to RDS.
Figure 2-8 shows the data processing flow.
Figure 2-8 Data processing flow
Data transfer and processing flow:
1. The MapReduce activity of DPS transfers OBS data of the public cloud and theprograms developed by the user to MRS.
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
11
2. After MRS processes the data, it stores the processed data in the HDFS of MRS.3. The Database<->HDFS activity of DPS transfers the data stored in the HDFS to the data
table of RDS.
Logging In to DPS
Step 1 Log in to the management console.
Step 2 Choose All Services > EI Enterprise Intelligence > Data Pipeline Service. The DPSconsole is opened.
----End
Creating Pipeline
Step 1 Click in the upper left corner on the DPS console and select your region and project.
Step 2 On the Pipeline Manager page, click Buy Pipeline.
Figure 2-9 Buying a pipeline
Step 3 On the Specify Details page, configure the required parameters (as shown in Figure 2-10)and click Buy Now.
Figure 2-10 Specifying service details
Step 4 On the Confirm Specifications page, confirm your order information, and click Next.
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
12
Figure 2-11 Confirming order information
Step 5 On the Pay page, select a payment mode and click OK.
After the pipeline is successfully bought, the system redirects you to the Pipeline Managerpage.
----End
Configuring Pipeline
Step 1 On the Pipeline Manager page, click Edit in the Operation column for the newly createdpipeline.
Step 2 Drag and drop the OBS, HDFS, and RDS data sources and the MapReduce and RDS<->HDFS activities to the edit grid area, and connect them as shown in Figure 2-12.
Figure 2-12 Connecting data sources and activities
Step 3 Click the data sources and activities one by one. On the configuration page that is displayed atthe right of the edit grid area, configure the required parameters.
l Data source: For details about how to configure the data source, see Data Sources.
l Activity: For details about how to configure the activity, see Activities.
Step 4 Click . The system checks the parameter validity of the pipeline.
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
13
In the displayed dialog box with the message "Are you sure you want to save the pipeline?",click Yes. If the pipeline is valid, it is saved successfully.
----End
Scheduling Pipeline
Step 1 On the Pipeline Manager page, click Schedule in the Operation column for the newly createdpipeline.
The Schedule Pipeline dialog box is displayed. Configure the pipeline schedule task asshown in Figure 2-13.
Figure 2-13 Configuring the pipeline schedule task
Step 2 Click OK.
Step 3 Click Run in the Operation column to start the schedule task for the pipeline.
----End
Viewing Pipeline Running Information
Viewing Pipeline Running Status:
Step 1 On the Pipeline Manager page, click the name of the newly created pipeline. You can viewthe pipeline running information in the Running History area of the displayed page.
Step 2 Click to refresh the pipeline and activity running information.
NOTE
To view the running status of each activity in the pipeline, click at the left side of each runningrecord.
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
14
If the pipeline fails to run, click View Log in the Operation column (as shown in Figure2-14). The log helps you find the failure cause.
Figure 2-14 Viewing activity running record
----End
Viewing the Output RDS Data:
Step 1 Use the MySQL client to connect to the RDS MySQL instance as the root user.
For details, see section "Connecting to an RDS MySQL Instance" in Relational DatabaseService User Guide.
Step 2 Run the database commands to go to the RDS database table storing HDFS data and view thedetailed data in the database table.
Commands
l Selecting a database: use database_namel Viewing a database table: select * from table_name
----End
Data Pipeline ServiceUser Guide 2 Getting Started
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
15
3 Installing DPS Agent
3.1 Overview
3.1.1 Introduction to DPS AgentDPS Agent is a platform provided by Data Pipeline Service (DPS) for running user-definedactivities. With DPS Agent, you can develop your own activities, such as Shell scripts, andthen schedule and manage your activities using DPS.
3.1.2 Installation FlowFigure 3-1 illustrates the installation flow of DPS Agent.
Figure 3-1 DPS Agent installation flowchart
3.2 Installation Preparation
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
16
3.2.1 Purchasing Elastic Cloud Server (ECS)
Procedure
Step 1 Purchase an ECS.
For details, see Getting Started > Purchasing an ECS in Elastic Cloud Server Usage Guide.
Step 2 Log in to the ECS.
For details, see Getting Started > Logging In to an ECS in Elastic Cloud Server UsageGuide.
l If no elastic IP address (EIP) is purchased or is bound to the ECS, you can log in to theECS through Virtual Network Computing (VNC).
l If you have purchased an EIP and bind the EIP to the ECS, you can log in to the ECSthrough Secure Shell (SSH).
NOTE
l You are advised to log in to the Linux-based ECS through SSH (During the login, you are requiredto enter the username and password). In this installation guide, logging in to the ECS through SSH isused as an example.
l If you need to bind an EIP to the ECS, see Binding EIP.
l If you need to unbind an EIP from the ECS, see Unbinding EIP. After the EIP is unbound from theECS, you cannot log in to the ECS through SSH.
Step 3 (Optional) Set the security group.
The security group rules take effect in the following directions: inbound and outbound.
l Inbound: External services access the ECS server in the security group.l Outbound: An ECS server in the security group accesses instances outside the security
group.
To prevent malicious attacks, you are required to configure the inbound security group ruleand set the outbound security group rule to any IP address (ensure that DPS Agent cannormally access DPS). For details, see Configuring Security Group.
NOTE
For more information about security groups, see Security > Security Group in Virtual Private CloudUser Guide.
----End
3.2.2 Obtaining an AK/SK Pair
Background
Access Key ID/Secret Access Key (AK/SK) files are created by Identity and AccessManagement (IAM) to authenticate calls to application programming interfaces (APIs) on thepublic cloud.
During the startup of DPS Agent, DPS Agent uses the AK/SK pair to access DPS. After DPSAgent is started, it uses the AK/SK pair to access and operate other public cloud services.
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
17
NOTICEBefore creating the AK/SK pair, ensure that your public cloud account (used to log in to themanagement console) has passed the real-name authentication.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click your username in the upper right corner of the page, and select Basic Information fromthe drop-down list.
Step 3 On the Account Info page, click the Manage my credentials.
Step 4 On the My Credential page, click the Access Keys tab. Then click Add Access Key. TheAdd Access Key dialog box is displayed.
If an AK/SK pair has been created, you can directly use it.
NOTE
Each user can create a maximum of two AK/SK pairs. If you want to create a new AK/SK pair, deletethe existing one.
Step 5 Enter the required information as prompted and click OK to download the AK/SK file.
NOTE
l During the download of the AK/SK file, if you cancel the download, the AK/SK file cannot be re-downloaded.
l Save the downloaded AK/SK pair properly to prevent information leakage.
----End
Follow-up Procedure
If you find that your AK/SK pair is abnormally used (for example, the AK/SK pair is lost orleaked) or will be no longer used, delete your AK/SK pair in the IAM system or contact theOBS administrator to reset your AK/SK pair.
NOTE
Deleted AK/SK pairs cannot be restored.
3.2.3 Installing JRE
Prerequisitesl You have downloaded the Java runtime environment (JRE) installation package of
version 1.8.0 or later. Download address: https://www.java.com/en/download/manual.jsp.
l You have obtained the EIP and the root user password of the ECS server.
l PuTTY and WinSCP tools have been installed on the local Windows-based PC.
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
18
Procedure
Step 1 Use the PuTTY tool to remotely log in to the ECS server as the root user.
Step 2 Run the following commands to create the /opt/jre directory in the ECS server for storing theJRE installation package.
mkdir -p /opt/jre
Step 3 Run the following command to assign permission 777 to the JRE installation directory:
chmod -R 777 /opt/jre
Step 4 Use the WinSCP tool to upload the JRE installation package to the /opt/jre directory.
Step 5 Run the following commands to decompress the JRE installation package:
cd /opt/jre
tar -zxvf 'JRE installation package name'.tar.gz
Step 6 Run the following command to edit the /etc/profile configuration file.
vim /etc/profile
Set the JAVA_HOME configuration item to the JRE installation directory.
export JAVA_HOME=/opt/jre/jre_filenameexport PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
/opt/jre/jre_filename is the path to the JRE installation directory after the JRE installationpackage is decompressed. You can change it as required.
After the modifications are completed, enter :wq to save the modifications and exit.
Step 7 Run the following command to make the JRE configuration take effect:
source /etc/profile
----End
Verification
Run the following command to query the JRE version. If the JRE version is earlier than 1.8.0,uninstall it and re-install the JRE of version 1.8.0 or later.
java -version
3.2.4 Configuring hosts File
Prerequisitesl You have obtained the EIP and the root user password of the ECS server.l The PuTTY tool has been installed on the local Windows-based PC.
Procedure
Step 1 Use the PuTTY tool to remotely log in to the ECS server as the root user.
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
19
Step 2 Run the following commands to view IP address and host name of the ECS server:
ip address
hostname
As shown in the following, 192.168.0.43 indicates the IP address of the ECS server. Note that192.168.0.43 is only an example here.
[root@ecs-192c ~]# ip address1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:59:10:ba brd ff:ff:ff:ff:ff:ff inet 192.168.0.43/24 brd 192.168.0.255 scope global dynamic eth0 valid_lft 73176sec preferred_lft 73176sec inet6 fe80::f816:3eff:fe59:10ba/64 scope link valid_lft forever preferred_lft forever[root@ecs-192c ~]# hostnameecs-192c
Step 3 Run the following command to modify the hosts file:
echo 'IP HOSTNAME' >> /etc/hosts
Parameter description:
IP HOSTNAME indicates the IP address and host name obtained in Step 2.
----End
3.3 Deploying DPS Agent
3.3.1 Installing DPS Agent
Prerequisitesl All operations in Installation Preparation have been completed.l You have obtained the EIP and the root user password of the ECS server.l The PuTTY tool has been installed on the local Windows-based PC.
Procedure
Step 1 Use the PuTTY tool to remotely log in to the ECS server as the root user.
Step 2 Run the following command to create the /opt/dps directory in the ECS server for storing theDPS Agent installation package:
mkdir -p /opt/dps
NOTE
/opt/dps is the default installation directory. If this directory does not exist, you need to run thepreceding commands to create it.
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
20
Step 3 Run the following commands to download the DPS Agent installation package:
cd /opt/dps
wget http://obs.myhwclouds.com/dps-program/dps-agent.tar.gz.sha256
wget http://obs.myhwclouds.com/dps-program/dps-agent.tar.gz
Step 4 Run the following command to check whether the DPS Agent installation package ismodified:
sha256sum -c dps-agent.tar.gz.sha256
l If OK is returned, the DPS Agent installation package has not been modified. Then go toStep 5.
l If FAILED is returned, the DPS Agent installation package is modified. Contact thetechnical support to obtain the new installation package.
Step 5 Run the following command to decompress the DPS Agent installation package:
tar -zxf dps-agent.tar.gz
Step 6 Run the following command to go to the directory generated after the DPS Agent installationpackage is decompressed:
cd agent
Step 7 Run the following command to run the DPS Agent installation program:
bash bin/install.sh
Enter y as prompted to continue the installation.
After the DPS Agent installation completes, the system displays a message, telling you thatthe installation succeeds and asking you to configure related files.
----End
3.3.2 Configuring DPS Agent
Prerequisitesl You have obtained the EIP and the root user password of the ECS server.l The PuTTY tool has been installed on the local Windows-based PC.l If the certificate verification function is enabled, ensure that the API gateway certificate
is available. For details about how to generate an API gateway certificate, seeGenerating API Gateway Certificate.
Procedure
Step 1 Use the PuTTY tool to remotely log in to the ECS server as the root user.
Step 2 Run the following command to go to the DPS Agent installation directory:
cd /opt/dps/agent
Step 3 Run the following command to modify the DPS Agent configuration file:
vim conf/agent.conf
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
21
Table 3-1 Configuration items of the DPS Agent
ConfigurationItem
Mandatory orNot
Description
agent.name Yes DPS Agent name, which must be 1 to 64 characterslong and contains only letters, digits, and underscores(_).
agent.user.ak Yes AK obtained in Obtaining an AK/SK Pair.NOTICE
Before creating the AK/SK pair, ensure that your publiccloud account (used to log in to the management console)has passed the real-name authentication.
agent.user.sk Yes Encrypted SK.1. Obtain the SK. For details, see Obtaining an
AK/SK Pair.2. Use the WCC tool to encrypt the obtained SK. For
details, see Using the WCC Tool to EncryptPasswords.
NOTICEBefore creating the AK/SK pair, ensure that your publiccloud account (used to log in to the management console)has passed the real-name authentication.
agent.apigateway.endpoint
Yes Domain name of the public cloud API Gatewayaddress, for example, https://dps.cn-north-1.myhuaweicloud.com.You can obtain the domain name in Regions andEndpoints.
agent.obs.ip Yes Domain name of the OBS server. You can obtain thedomain name in Regions and Endpoints.
agent.trusted.jks.enabled
Yes An indication of whether to enable certificateverification.l False: Disable certificate verification.l True: Enable certificate verification. If you enable
certificate verification, the following parametersneed to be configured: agent.trusted.jks.path,agent.trusted.jksPasswd, andagent.hostname.verify.
Default value: false.
agent.trusted.jks.path No Path to the directory where the API Gatewaycertificate is stored. For details, see Generating APIGateway Certificate.
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
22
ConfigurationItem
Mandatory orNot
Description
agent.trusted.jksPasswd
No Ciphertext password (encrypted password) of the APIgateway certificate.1. Obtain the plaintext certificate password. For
details, see Generating API Gateway Certificate.2. Use the WCC tool to encrypt the plaintext
password. For details, see Using the WCC Tool toEncrypt Passwords.
agent.hostname.verify
No An indication of whether to enable domain nameverification for the public cloud gateway.l False: Disable domain name verification.l True: Enable domain name verification.
----End
3.3.3 Starting DPS Agent
Prerequisitesl You have obtained the EIP and the root user password of the ECS server.
l The PuTTY tool has been installed on the local Windows-based PC.
Procedure
Step 1 Use the PuTTY tool to remotely log in to the ECS server as the root user.
Step 2 Run the following command to go to the DPS Agent installation directory:
cd /opt/dps/agent
Step 3 Run the following command to start DPS Agent:
bash bin/agent.sh start
Step 4 Log in to the DPS console.
Step 5 Create a pipeline. For details, see Buying a Pipeline.
Step 6 Edit the pipeline. For details, see Editing a Pipeline.
1. On the Edit page, drag and drop the Shell Script activity to the edit grid area (namely,Canvas), and click the activity.
2. The configuration page is displayed at the right side of the edit grid area. On thisconfiguration page, click the ComputeResource drop-down list, and check whether DPSAgent started in Step 3 is included in the drop-down list.
– If yes, DPS Agent is successfully started.
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
23
– If no, DPS Agent fails to be started, perform Step 3 to restart DPS Agent. If DPSAgent is still not included in the ComputeResource drop-down list after the restartcompletes, contact technical support.
----End
3.3.4 Verifying DPS Agent
Prerequisitesl You have obtained the EIP and the root user password of the ECS server.l The PuTTY tool has been installed on the local Windows-based PC.
Procedure
Step 1 Log in to the DPS console.
Step 2 Select the pipeline to be edited, and click Edit in the Operation column. The Edit page isdisplayed.
Step 3 On the displayed page, drag and drop the Shell Script activity to the edit grid area, and clickthe activity.
Step 4 The configuration page is displayed at the right side of the edit grid area. On thisconfiguration page, configure the Shell Script activity. Table 3-2 shows the exampleconfigurations of the Shell Script activity.
Table 3-2 Configuring the Shell Script activity
Property Mandatoryor Not
Description Example Value
Name Yes Activity name. ShellScript_1271
Compute Resource Yes Name of the DPS Agentthat has been registered inthe ECS server.NOTE
If the installed DPS Agent isnot available, contacttechnical support.
test
Script Path Yes Absolute path to the shellscript on the ECS server.
/tmp/test.sh
Log Backup Yes An indication of whetherto back up logs.
True
Destination LogPath
Yes Log backup directory.Currently, logs can bebacked up only on OBS.
s3a://dpsfile/log
Step 5 Click . In the displayed dialog box, click OK. If all the configurations are valid, amessage is displayed, indicating that the pipeline is successfully saved.
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
24
If the pipeline fails to be saved, the possible causes are as follows:
l There is a loop in the pipeline.
l There are more than 32 activities in the pipeline.
l The configurations of a data source or activity are invalid.
l The link relationships of an activity are not complete.
Step 6 Use the PuTTY tool to remotely log in to the ECS server as the root user. Run the followingcommands in the /tmp directory to create the test.sh script:
cd /tmp
touch test.sh
Step 7 Run the following command to edit the test.sh script:
vim test.sh
Enter i, and add the following lines to the test.sh script. Then enter :wq to save themodifications and exit.
BIN_HOME ='dirname $0' #Query the script path.cd $BIN_HOME #Switch to the directory where the script is stored.echo "Hello World" > /tmp/result.txt
Step 8 Run the following command to check the test.sh script:
cat test.sh
[datasight@cce-masterinit tmp]# cat test.shecho "Hello World" > /tmp/result.txt
Step 9 Run the following command to set the execution permission for the test.sh script:
chmod 750 test.sh
[datasight@cce-masterinit tmp]# chmod 750 test.sh[datasight@cce-masterinit tmp]# ls -ltotal 8-rwxr-xr-x 1 datasight datasight37 Apr 10 22:07 test.sh
Step 10 Log in to the DPS console. On the Pipeline Manager page, select the pipeline to be run, andclick Run in the Operation column.
Step 11 Click the name of the pipeline. You can view the pipeline and activity running information inthe Running History area of the displayed page.
Step 12 Use the PuTTY tool to remotely log in to the ECS server as the root user. Run the followingcommand, and check whether message Hello World is displayed in the /tmp/result.txt file:
cat /tmp/result.txt
If the following information is displayed after the command is run, DPS Agent runs normally:[datasight@cce-masterinit tmp]# cat /tmp/result.txt Hello World
----End
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
25
3.3.5 Stopping DPS Agent
Prerequisitesl You have obtained the EIP and the root user password of the ECS server.l The PuTTY tool has been installed on the local Windows-based PC.
Procedure
Step 1 Use the PuTTY tool to remotely log in to the ECS server as the root user.
Step 2 Run the following command to go to the DPS Agent installation directory:
cd /opt/dps/agent
Step 3 Run the following command to stop DPS Agent.
bash bin/agent.sh stop
----End
3.4 (Optional) Connecting to DWS Cluster
BackgroundData Warehouse Service (DWS) is an online data processing database that runs on the publiccloud architecture and platform.
DPS provides the DWS activity to help you quickly process and transfer data. For details, seeDWS SQL. If you need to use the DWS activity provided by DPS, download and configurethe DWS client by following the instructions provided in this section.
Prerequisitesl A DWS cluster has been created, and you have obtained the internal access address, port
number, admin account, and password of the cluster.l You have obtained the EIP and the root user password of the ECS server.l PuTTY and WinSCP tools have been installed on the local Windows-based PC.
Procedure
Step 1 Download the DWS client file.
1. Log in to the DWS console.2. Click Connection Management in the left navigation pane. In the displayed page, select
the required client type, and click Download.
Step 2 Use the WinSCP tool to upload the DWS client file to the /tmp directory of the ECS server.
Step 3 Configure the DWS client and connect it to the DWS cluster.
1. Use the PuTTY tool to remotely log in to the ECS server as the root user.2. Run the following command to go to the directory where the DWS client file is stored.
cd /tmp
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
26
3. Run the following command to decompress the DWS client file.tar -xvf dws_client_redhat_x64.tar.gz
4. Run the following command to configure the DWS client:source gsql_env.shIf the following information is displayed, the DWS client is configured successfully.All things done.
5. Run the following command to use the gsql tool provided by the DWS client to connectto the database in the DWS cluster:gsql -d postgres -h IP -U dbadmin -p PORT -W PasswordModify the following parameters based on the actual environment:– IP: internal access address of the DWS cluster.– Dbadmin: administrator of the DWS cluster.– PORT: port number of DWS.– Password: password of the administrator.If the following information is displayed, the gsql tool is successfully connected to thedatabase:postgres=>
6. Run the following command to exit the gsql tool.\q
----End
3.5 Common Operations
3.5.1 Binding EIP
Procedure
Step 1 Log in to the management console.
Step 2 On the homepage, choose Network > Virtual Private Cloud.
Step 3 In the left navigation pane, click Elastic IP Address.
On the displayed Elastic IP Address page, you can purchase and bind the EIP. For details,see Network Components > EIP > Assigning an EIP and Binding It to an ECS in VirtualPrivate Cloud User Guide.
----End
3.5.2 Unbinding EIP
Procedure
Step 1 Log in to the management console.
Step 2 On the homepage, choose Network > Virtual Private Cloud.
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
27
Step 3 In the left navigation pane, click Elastic IP Address.
Step 4 Find the target EIP in the EIP list, and click Unbind in the Operation column.
Step 5 In the displayed dialog box, click OK.
----End
3.5.3 Configuring Security Group
Procedure
Step 1 Log in to the management console.
Step 2 On the homepage, select Network > Virtual Private Cloud.
Step 3 In the left navigation pane, click Security Group.
Step 4 On the displayed Security Group page, click Create Security Group, and then complete thecreation of a security group as instructed.
Step 5 Find the newly created security group in the security group list, and click Add Rule in theOperation column. In the displayed Add Rule dialog box, configure the rules for the securitygroup.
NOTE
l Inbound: Set this parameter based on the actual requirements.
l Outbound: Set the configurations by referring to Figure 3-2.
Figure 3-2 Adding security group rules
----End
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
28
Follow-Up OperationsAfter a security group is added, you need to add the newly purchased ECS server to thesecurity group.
Step 1 Log in to the management console.
Step 2 In the homepage, choose Computing > Elastic Cloud Server.
Step 3 In the ECS list, click the name of the target ECS.
Step 4 On the displayed page, click the NIC tab, and click Change Security Group.
Step 5 In the displayed Change Security Group dialog box, select the security group created inStep 4.
----End
3.5.4 Generating API Gateway Certificate
Procedure
Step 1 Remotely log in to the ECS server as the root user.
Step 2 Run the following command to obtain the public cloud API gateway server certificate:
echo -n | openssl s_client -connect IP:PORT | sed -ne '/-BEGIN CERTIFICATE-/,/-ENDCERTIFICATE-/p' > apigateway.pem
Parameter description:
IP:PORT indicates the IP address and port number of the public cloud API gateway.
Step 3 Run the following command to generate the gateway.jks certificate:
keytool -import -file apigateway.pem -keystore gateway.jks
After this command is run, the system prompts you to configure the certificate password. Thispassword will be used in the other operations. Keep it confidential to protect informationsecurity.
Step 4 Run the following command to copy the gateway.jks certificate to the conf directory:
cp gateway.jks /opt/dps/agent/conf/
----End
3.5.5 Using the WCC Tool to Encrypt Passwords
Procedure
Step 1 Remotely log in to the ECS server as the root user.
Step 2 Run the following command to go to the DPS Agent installation directory:
cd /opt/dps/agent
Step 3 Run the following command to run the WCC tool:
bash bin/encrypt-tool.sh
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
29
Enter the plaintext password as prompted.
The WCC tool encrypts the entered plaintext password.
Save the encrypted password properly.
----End
3.5.6 Modifying the Run User and User Group of DPS Agent
Background
After DPS Agent is installed, using non-root users to run DPS Agent is recommended in orderto ensure system security.
This section describes how to modify the run user of DPS Agent to the datasight user.
Procedure
Step 1 Remotely log in to the ECS server as the root user.
Step 2 Run the following commands to create the datasight user and user group:
groupadd datasight
useradd datasight -g datasight -m -s /bin/bash
usermod datasight -a -G datasight
Step 3 Run the following command to go to the directory where the DPS Agent installation packageis stored, for example, /opt/dps:
cd /opt
Run the following command to modify the user and user group of the DPS Agent installationpackage:
chown -R datasight:datasight dps
The user and user group of the DPS Agent installation package are changed from root todatasight, as shown in the following:
[root@DPSNCM01 dps]# lltotal 4-rw-r----- 1 datasight datasight 123027036 Apr 14 18:30 DPSAgent.zip[root@DPSNCM01 dps]# ll ..total 8drwxr-x--- 2 datasight datasight 4096 Apr 14 18:30 dpsdrwxr-x--- 2 root root 4096 Apr 14 18:31 jdk
Step 4 Run the following command to switch from the root user to the datasight user:
su datasight
The following information is displayed:
[root@cce-masterinit usr]# su datasight[datasight@cce-masterinit usr]$
----End
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
30
3.5.7 Resetting the Password of API Gateway Certificate
Procedure
Step 1 Remotely log in to the ECS server as the root user.
Step 2 Run the following commands to go to the DPS Agent installation directory and stop therunning of DPS Agent:
cd /opt/dps/agent
bash bin/agent.sh stop
Step 3 Run the following command to go to directory where the API gateway certificate is stored:
cd /opt/dps/agent/conf
Step 4 Run the following command to change the certificate password:
keytool -storepasswd -keystore gateway.jks
Enter the old certificate password as prompted, and then enter the new certificate passwordtwice. Then the certificate password is changed successfully.
Step 5 Use the WCC tool to encrypt the new plaintext password, and write the generated ciphertextpassword in the agent.trusted.jksPasswd configuration item of the agent.conf file.
For details, see Using the WCC Tool to Encrypt Passwords and Configuring DPS Agent.
Step 6 Run the following command to start DPS Agent:
cd /opt/dps/agent
bash bin/agent.sh start
----End
Data Pipeline ServiceUser Guide 3 Installing DPS Agent
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
31
4 Working With DPS
4.1 Pipeline Manager
4.1.1 Buying a Pipeline
Scenario
A pipeline is a logical group of activities that execute a data processing task collaboratively.Before using DPS, you need to purchase pipelines.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.
l The number of pipelines does not exceed the quota.
NOTE
By default, a maximum of 10 pipelines can be created. If this quota cannot satisfy yourrequirement, you can increase the quota. To increase the quota, click Apply for a higher quota.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Pipeline Manager.
Step 4 On the Pipeline Manager page, click Buy Pipeline. The system displays the Buy Pipelinepage.
Step 5 On the Basic Information page, configure pipeline parameters. Table 4-1 describes thepipeline parameters.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
32
Table 4-1 Pipeline parameters
Parameter Description
Pipeline Name Name of the pipeline.A pipeline name is 1 to 62 characters long and contains onlyletters, digits, and underscores (_).
Description Pipeline description.
Preset Data Preset pipeline configuration mode.l Not Configured: No configurations are preset for the
pipeline.l Import from File: Import the preset pipeline
configurations from a JSON pipeline file.l Import from Template: Use the preset pipeline
configurations in a template provided by DPS.
Region Current region.
Purchase Quantity Validity period of the pipeline.After you determine the validity period, DPS automaticallycalculates the fees you need to pay.NOTE
For details about billing details, click Price Details in Price
Step 6 Click Buy Now.
Step 7 On the Confirm Specifications page, confirm your order information, and click Next.
Step 8 Select either of the following payment modes: Coupons, Balance, Online Payment, or Payby transfer and remittance.
Step 9 Click OK. The service is purchased.
----End
4.1.2 Editing a Pipeline
ScenarioYou can edit the data sources and activities of the pipeline that you have purchased ifnecessary.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l Pipelines have been purchased.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
33
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Pipeline Manager.
Step 4 On the Pipeline Manager page, enter the name of the pipeline in the search box at the upper
right corner, and click .
Step 5 Click Edit in the Operation column for a created pipeline. The Edit page is displayed.
The left side of the Edit page is divided into two sections:
l Data Sources. For details, see Data Sources.
l Activities. For details, see Activities.
Step 6 Drag any data source or activity and drop it in the edit grid area (Canvas) on the right side.The following process uses an OBS data source as an example of how to configure a datasource or activity:
1. Drag and drop the OBS data source to the edit grid area. Click the OBS data source.
2. The configuration page is displayed at the right side of the edit grid area. Configure theOBS properties.
NOTE
You can also click on the upper part of the edit grid area to import the pipeline file from your localdirectory. The newly imported pipeline information will overwrite the existing one.
Step 7 Put your mouse on the icon of the OBS data source. Then the icon appears. Drag this iconand link the OBS data source to an activity.
Figure 4-1 shows a successfully linked pipeline.
Figure 4-1 Successfully linked pipeline
Table 1 Link relationships between sources and activities describes the link relationshipsbetween data sources and activities.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
34
Table 4-2 Link relationships between data sources and activities
Activity Link Relationships
HDFS<->OBS OBS -> [HDFS<->OBS] -> HDFSHDFS -> [HDFS<->OBS] -> OBS
Database<->HDFS HDFS -> [Database<->HDFS] -> RDSRDS -> [Database<->HDFS] -> HDFS
HDFS->HBASE HDFS -> [HDFS->HBASE] -> HBase
UQuery<->OBS OBS -> [UQuery<->OBS] -> UQuery TableUQuery Table -> [UQuery<->OBS] -> OBS
ExecuteCDM CDM Source/OBS -> ExecuteCDM -> CDM Source/OBS
CDM Job Any data source -> CDM Job -> any data sourceThe CDM Job activity can be connected to the Shell Script, CDMJob, Create OBS, and Delete OBS.
SparkSQL Any data source -> SparkSQL -> any data source
Spark OBS/HDFS/Dummy -> Spark -> OBS/HDFS/Dummy
Hive OBS/HDFS/Dummy -> Hive -> OBS/HDFS/Dummy
MapReduce OBS/HDFS/Dummy -> MapReduce -> OBS/HDFS/Dummy
Shell Script Any data source -> Shell Script -> any data sourceThe Shell Script activity can be connected to the Shell Script,CDM Job, Create OBS, and Delete OBS.
MachineLearning HDFS/Dummy -> MachineLearning -> HDFS/Dummy
Elasticsearch OBS/ES Storage/Dummy -> Elasticsearch -> ES StorageES Storage -> Elasticsearch -> OBS/ES Storage/Dummy
RDS SQL RDS -> RDS SQL -> RDS
DWS SQL DWS -> DWS SQL -> DWS
UQuery SQL OBS -> UQuery SQL -> UQuery TableUQuery Table -> UQuery SQL -> UQuery Table
Create OBS Any data source -> Create OBS -> any data sourceThe Create OBS activity can be connected to the Shell Script,CDM Job, Create OBS, and Delete OBS.
Delete OBS Any data source -> Delete OBS -> any data sourceThe Delete OBS activity can be connected to the Shell Script,CDM Job, Create OBS, and Delete OBS.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
35
Step 8 Click . A dialog box with the message "Are you sure you want to submit the pipeline?" isdisplayed. Click Yes.
If the pipeline fails to be saved, the possible causes are as follows:
l An isolated data source exists in the pipeline.
l There is a loop in the pipeline.
l There are more than 32 activities in a pipeline.
l The configurations of a data source or activity are invalid.
l The link relationships of an activity are not complete.
Step 9 (Optional) Click to run the pipeline.
After the pipeline runs, you can view the current running status and running result of eachactivity in the edit grid area.
If you want to view the running status of the pipeline after you exit the editing page, use oneof the following methods:
l On the Pipeline Manager page, click on the left of the pipeline name.
l On the Pipeline Manager page, click the pipeline name. For details, see Monitoring aPipeline.
Step 10 (Optional) Click to export pipeline data as a JSON pipeline file to your local PC.
NOTE
The Pipeline Manager page also provides a function for you to export pipeline data. For details, seeExporting a Pipeline.
Step 11 (Optional) Click to add a tab to the edit grid area for remarks.
To add an association tab for a data source or activity, select the data source or activity and
click . Alternatively, right-click the data source or activity in the edit grid area and chooseadd Note from the shortcut menu.
Constraints on using tabs:
l Each note can contain a maximum of 1000 English characters.
l Each data source or activity can have multiple notes.
l A pipeline can have a maximum of 40 notes.
----End
4.1.3 Scheduling a Pipeline
Scenario
After editing a pipeline, you can configure a pipeline scheduling mode: manual or automatic.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
36
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.
l The pipeline has been edited and is stopped.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Pipeline Manager.
Step 4 On the Pipeline Manager page, enter the name of the pipeline in the search box at the upper
right corner, and click .
Step 5 Click Schedule in the Operation column of a pipeline. The Schedule Pipeline dialog box isdisplayed. Configure the pipeline schedule task by referring to Table 4-3.
Table 4-3 Pipeline schedule parameters
Parameter Description
Schedule Type Pipeline schedule type. Options are as follows:l Run once: The pipeline will be run only once.l Run periodically: The pipeline will be run
periodically.
Running Cycle Interval at which the pipeline runs.This parameter is displayed when Schedule Typeis set to Run periodically.
Start Time Time at which the pipeline schedule task starts.Must be earlier than the end time.This parameter is displayed when Schedule Typeis set to Run periodically.
End Time Time at which the pipeline schedule task ends.This parameter is displayed when Schedule Typeis set to Run periodically.
Cross-Cycle Dependency Please select the dependency between instances ofthe same pipeline.l Not dependent on the previous schedule cycle.l Self-dependent. The current schedule task can
continue only after the previous schedule cycleends.
This parameter is displayed when Schedule Typeis set to Run periodically.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
37
Parameter Description
Dependency pipeline Pipeline A cannot depend on Pipeline B in any ofthe following circumstances:l The running cycle of Pipeline B is longer than
that of Pipeline A.l Pipeline B's running cycle is set to hours or
minutes, while that of Pipeline A is set toweeks.
This parameter is displayed when Schedule Typeis set to Run periodically.
Dependency execution strategy Pipeline execution policy when a pipeline dependson other pipelines. Options are as follows:l success: The current pipeline is executed only
when the other pipeline instances on which thepipeline depends are executed successfully.
l any result: The current pipeline is executedwhen the other pipeline instances on which thepipeline depends are executed and no matterwhat is the execution result.
Step 6 Click OK to save the schedule configurations.
Step 7 Click Run in the Operation column for the pipeline. Then the Run button turns to Pause.l If Schedule Type is set to Run Once, the pipeline starts to run.l If Schedule Type is set to Run Periodically, the pipeline starts to run at the preset time.
----End
4.1.4 Monitoring a Pipeline
Scenario
You can view the running status and log information of pipelines and activities if necessary.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l Pipelines have been purchased.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Pipeline Manager.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
38
Step 4 On the Pipeline Manager page, enter the name of the pipeline in the search box at the upper
right corner, and click .
Step 5 Click the name of the pipeline to be monitored. You can view the pipeline monitoringinformation in the Running History area of the displayed page.
Step 6 Click to refresh the monitoring information.
Table 4-4 Pipeline monitoring parameters
Parameter Description
Status Status of the schedule task, which can be Success, Failed,Running, Paused, Deleted, or Canceled.
Running Duration(min)
Running duration of the pipeline.
Start Time Time at which the pipeline starts to run.
End Time Time at which the pipeline stops running.
Instance GenerationTime
Time at which the instance was generated.
Running Type Scheduling mode of the pipeline.
Click on the left of a pipeline running record. Then the running information about eachactivity of the pipeline is displayed.
Table 4-5 Activity monitoring parameters
Parameter Description
Name Activity name.
Type Activity type.
Status Activity status, which can be Success, Failed, Running, Paused,Deleted, or Canceled.
Running Duration(min)
Running duration of the activity.
Start Time Time at which the activity starts to run.
Retry Count Number of retries upon an activity execution failure.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
39
Parameter Description
Operation View Log: You can query the logs of activities in the Success orFailed state.Logs cannot be queried in the following scenarios:l Logs of the SparkSQL and MachineLearning activities cannot
be queried.l If the log backup property for activities is set to false, no logs
can be viewed. For details, see Activities.NOTE
If an activity encounters a fault, logs help quickly locate and resolve thefault.
Error Message Error message that is displayed.
----End
4.1.5 Exporting a Pipeline
Scenario
If you need to back up a pipeline, or use the existing pipeline as a new pipeline template,export a JSON pipeline file.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.
l Pipelines have been purchased.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Pipeline Manager.
Step 4 On the Pipeline Manager page, enter the name of the pipeline in the search box at the upper
right corner, and click .
Step 5 Select the pipeline to be exported, choose More > Export in the Operation column to exportthe pipeline data as a JSON pipeline file.
NOTE
The exported JSON pipeline file does not contain the pipeline schedule configurations or sensitiveinformation (such as accounts and passwords).
----End
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
40
4.1.6 Stopping a Pipeline
Scenario
You can stop a pipeline if necessary.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l The pipeline is not in the Stopped, Deleted, or Frozen state.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Pipeline Manager.
Step 4 On the Pipeline Manager page, enter the name of the pipeline in the search box at the upper
right corner, and click .
Step 5 Choose More > Stop in the Operation column for a pipeline. In the displayed dialog box,click OK to confirm your operation.
----End
4.1.7 Deleting a Pipeline
Scenario
You can delete a pipeline if the pipeline will not be used any longer.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l Ensure that services are not affected after you delete the pipeline. If you need to back up
pipelines, see Exporting a Pipeline.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Pipeline Manager.
Step 4 On the Pipeline Manager page, enter the name of the pipeline in the search box at the upper
right corner, and click .
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
41
Step 5 Choose More > Delete in the Operation column for a pipeline. In the displayed dialog box,click OK to confirm your operation.
NOTE
A deleted pipeline can be restored.
----End
4.2 Connector List
4.2.1 Creating a DataSource Connector
ScenarioA DataSource connector records the connector information of RDS and DWS data sources. Adefined DataSource connector is available to data sources of RDS, DWS.
A connector can be used by more than one DPS data source. If the information about theconnector changes, you only need to modify the connector configurations in the connectorlist, and these modified configurations are automatically updated in the data sources of thepipeline.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l The current number of connectors does not reach the connector quota. The connector
quota is 20.l You have obtained the username, password, and URL of the data source.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Connector List.
Step 4 On the Connector List page, click Create Connector.
Step 5 In the displayed dialog box, select DataSource from the Connector Type drop-down list.Configure the DataSource parameters by referring to Table 4-6.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
42
Table 4-6 DataSource parameters
Parameter Mandatory orNot
Description Example
Connector Name Yes Connector name. A connectorname is 1 to 64 characters long andcontains only letters, digits, andunderscores (_).
dps_database_123
Database DriverName
Yes Name of the database driver:l com.mysql.jdbc.Driver: Used
for RDS data sources.l org.postgresql.Driver: Used for
DWS data sources.
com.mysql.jdbc.Driver
Connector URL Yes URL to the database. jdbc:mysql://IP:PORT
Database Name Yes Database name. dps
Username Yes Username for logging in to thedatabase.
dpsadmin
Password Yes Password for logging in to thedatabase.
-
Drive Path Yes Path to the JDBC driver.Download the JDBC driver on theMySQL official websites asrequired and upload the JDBCdriver to the OBS bucket.l If Database Driver Name is
set to com.mysql.jdbc.Driver,use the mysql-connector-java-5.1.21.jar driver.
s3a://dpsfile/mysql-connector-java-5.1.21.jar
KMS Encryption Yes Use KMS to encrypt and decryptuser passwords and private keys.Options: keys created in KMS.
dps/default
NOTE
KMS is the key management service provided by the public cloud. To create or manage keys, log in tothe management console, and choose Security > Key Management Service on the homepage to openthe KMS console.
Step 6 Click OK. A connector is created successfully.
----End
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
43
4.2.2 Creating a CDM Connector
ScenarioA CDMSource connector records the connector information of input and output data sourcesof CDM. A defined CDMSource connector is available to data sources of CDM Source.
A connector can be used by more than one DPS data source. If the information about theconnector changes, you only need to modify the connector configurations in the connectorlist, and these modified configurations are automatically updated in the data sources of thepipeline.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l The current number of connectors does not reach the connector quota. The connector
quota is 20.l You have obtained the information about the connectors, server IP addresses, and server
ports of the input and output data sources of CDM.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Connector List.
Step 4 On the Connector List page, click Create Connector.
Step 5 In the displayed dialog box, select CDMSource from the Connector Type drop-down list.Configure the CDMSource parameters by referring to Table 4-7.
Table 4-7 CDMSource parameters
Parameter Mandatory orNot
Description Example
Connector Name Yes Connector name. A connectorname is 1 to 21 characters long andcontains only letters, digits, andunderscores (_).
dps_cdmsource_123
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
44
Parameter Mandatory orNot
Description Example
Connector Type Yes Connector type. Options are asfollows:l obs-connector: Connects to the
OBS data source. For detailsabout parameter settings, seeTable 4-8.
l generic-jdbc-connector:Connects to DWS and MySQL.For details about parametersettings, see Table 4-9.
-
Table 4-8 Parameter settings for obs-connector
Parameter Mandatory or Not Description
Database IPAddress
Yes IP address of the OBS server.
Database Port Yes Port number of the OBS server.
AK Yes AK used for accessing the OBS server.
SK Yes SK used for accessing the OBS server.
KMS Encryption Yes Use KMS to encrypt and decrypt userpasswords and private keys.Options: keys created in KMS.
NOTE
KMS is the key management service provided by the public cloud. To create or manage keys, log in tothe management console, and choose Security > Key Management Service on the homepage to openthe KMS console.
Table 4-9 Parameter settings for generic-jdbc-connector
Parameter Mandatory or Not Description
Database Type Yes Type of the database. Options are as follows:l DWSl MYSQL
Database Name Yes Name of the database.
Database IPAddress
Yes IP address of the database server.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
45
Parameter Mandatory or Not Description
Database Port Yes Port number of the database server.
Username Yes Username used to log in to the database.Ensure that this user has permission to readand write data tables and read metadata in thedatabase.
Password Yes Password used to log in to the database.
KMS Encryption Yes Use KMS to encrypt and decrypt userpasswords and private keys.Options: keys created in KMS.
Step 6 Click OK. A connector is created successfully.
----End
4.2.3 Creating an ESSource Connector
Scenario
An ESSource connector records the connector information of an ES cluster. A definedESSource connector is available to data sources of ES Storage.
A connector can be used by more than one DPS data source. If the information about theconnector changes, you only need to modify the connector configurations in the connectorlist, and these modified configurations are automatically updated in the data sources of thepipeline.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.
l The current number of connectors does not reach the connector quota. The connectorquota is 20.
l You have obtained the URL information of the ES cluster.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Connector List.
Step 4 On the Connector List page, click Create Connector.
Step 5 In the displayed dialog box, select ESSource from the Connector Type drop-down list.Configure the ESSource parameters by referring to Table 4-10.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
46
Table 4-10 ESSource parameters
Parameter Mandatory orNot
Description Example
Connector Name Yes Connector name. A connectorname is 1 to 24 characters long andcontains only letters, digits, andunderscores (_).
dps_essource_123
Connector URL Yes IP address and port number used toaccess the ES cluster through theprivate network.Format: http://IP:PORT.Port 9200 is recommended foraccessing the ES cluster.
http://128.10.46.226:9200
Step 6 Click OK. A connector is created successfully.
----End
4.2.4 Editing a Connector
Scenario
You can modify a connector that has been created if necessary.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l Connectors have been created.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Connector List.
Step 4 On the Connector List page, click Edit in the Operation column for the connector to beedited.
Step 5 On the editing page, modify the connector information. For the description of each parameter,see the following:l DataSource connector: Table 4-6l CDMSource connector: Table 4-7l ESSource connector: Table 4-10
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
47
Step 6 Click OK. The modified parameter settings are saved successfully.
----End
4.2.5 Deleting a Connector
Scenario
You are advised to delete a connector if the connector will not be used any longer to reducethe quota occupation.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l Connectors have been created but have not been used by pipelines.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Connector List.
Step 4 On the Connector List page, click Delete in the Operation column for the connector to bedeleted.
Step 5 In the displayed dialog box, click OK to delete the connector.
----End
4.3 Resource List
4.3.1 Creating a DIS Resource
Scenario
DPS allows you to create a Data Ingestion Service (DIS) stream immediately, or create ordelete a DIS stream at a specific point in time. Reasonable DPS configurations help you fullyutilize DIS streams while reducing usage cost.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l The current number of resources does not reach the resource quota. The resource quota is
20.l Currently, only common Data Ingestion Service (DIS) streams can be created.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
48
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Resource List.
Step 4 On the Resource List page, click Create Resource.
Step 5 In the dialog box that is displayed, select DIS from the Type drop-down list. Configure theDIS resource parameters by referring to Table 4-11.
Table 4-11 DIS resource parameters
Parameter Mandatory or Not
Description Example
Schedule Type Yes Resource scheduling type. Options are asfollows:l At Once: Resources are created at once. If
the resources are no longer used, you needto manually delete them.
l On Schedule: Resources are automaticallyscheduled, created, and deleted at yourspecified resource schedule cycle, creationtime, and deletion time. Ensure that theinterval between the resource creationtime and the resource deletion time islonger than the resource schedule cycle.
-
Stream Name Yes Unique name of the DIS stream used to sendor receive data.A stream name is 1 to 64 characters long andcontains only letters, digits, hyphens (-), andunderscores (_).
dis-5acb
Partitions Yes Number of the partitions into which datarecords of the newly created DIS streams willbe distributed.Value range: an integer from 1 to 50.
10
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
49
Parameter Mandatory or Not
Description Example
Data Dumping Yes Location in which data from the DIS streamwill be stored.l No Dump: Data will be stored only in
DIS.l Dump to OBS: Data will be stored in DIS
and periodically dumped to OBS. Fordetails about parameter settings, see Table4-12.
NOTEData stored in DIS can be retained for only 24hours. After this period of time expires, data willbe automatically cleared.
-
Table 4-12 Parameter settings for dumping data to OBS
Parameter Mandatory or Not
Description
Dumped To Yes Name of the OBS bucket used to store data from theDIS stream.
IAM Agency Yes DIS uses an agency to access your specified resourcessuch as OBS buckets.Select an IAM agency from the drop-down list.
Dump Type Yes Data dumping type. Options are as follows:l Custom file: You can select which streaming data
will be saved into which folder. Custom files aredumped to OBS immediately after they aregenerated in the chosen folder.
l Periodic: Streaming data is automatically savedinto files in the chosen directory. Files are thendumped to OBS at regular intervals
Dump FileDirectory
No User-defined directory storing files that will bedumped to OBS. Use slashes (/) to separate differentdirectory levels.This parameter is displayed only when Dump Type isset to Periodic.
Dump Interval (s) Yes User-defined interval at which data from the DISstream is dumped to OBS.Value range: 60 to 900.This parameter is displayed only when Dump Type isset to Periodic.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
50
Step 6 Click OK to complete the resource creation.
----End
4.3.2 Creating an MRS Resource
ScenarioDPS allows you to create an MRS cluster immediately, or create or delete an MRS cluster ondemand or at a specific point in time. Reasonable DPS configurations help you fully utilizeMRS clusters while reducing usage cost.
MRS clusters in the resource management list can be used by data sources and activities thatneed to use MRS clusters, for example, HDFS data source and MapReduce activities.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l The current number of resources does not reach the resource quota. The resource quota is
20.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Resource List.
Step 4 On the Resource List page, click Create Resource.
Step 5 In the dialog box that is displayed, select MRS from the Type drop-down list. Configure theMRS resource parameters by referring to Table 4-13.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
51
Table 4-13 MRS resource parameters
Parameter Mandatoryor Not
Description Example
Schedule Type Yes Resource scheduling type. Options are asfollows:l On Demand: Resources are
automatically created and deleted basedon the resource usage of the pipeline.
l At Once: Resources are created at once.If the resources are no longer used, youneed to manually delete them.
l On Schedule: Resources areautomatically scheduled, created, anddeleted at your specified resourceschedule cycle, creation time, anddeletion time. Ensure that the intervalbetween the resource creation time andthe resource deletion time is longerthan the resource schedule cycle.
-
Resource Name Yes Resource name.A resource name is 1 to 64 characters longand contains only letters, digits, hyphens(-), and underscores (_).
mrs_123
Cluster Name Yes Unique name of the MR cluster.A cluster name is 1 to 64 characters longand contains only letters, digits, hyphens(-), and underscores (_).
mrs_11c6
AZ Yes An AZ is an area where power andnetworks are physically isolated. AZs inthe same region can communicate witheach other over an intranet.Currently, only the northchina 1 andeastchina 2 regions are supported. Theavailable areas under each region are asfollows:l In the northchina 1 region: AZ 2.l In the eastchina 2 region: AZ 1.
AZ 1
VPC Yes Virtual private cloud (VPC) in which thecluster is created. If there is no availableVPC, create one in advance.
vpc-11
Subnet Yes Subnet of the cluster. If there is noavailable subnet, create one in the VPC inadvance.
subnet-11
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
52
Parameter Mandatoryor Not
Description Example
Cluster Version Yes Currently, MRS 1.3.0, MRS 1.5.0, MRS1.5.1, and MRS 1.6.0 are supported.
MRS 1.3.0
Key Pair Yes Key pair used to access the master node ofthe cluster. If there is no available key pair,create or import one in advance.
KeyPair-7dbd
Logging Yes An indication of whether to back up logs.If this parameter is set to Yes, you need tospecify the OBS bucket where logs arestored.
-
SelectComponent
Yes MRS component. Options are as follows:l Hadoopl Sparkl HBasel Hive
Spark
Step 6 Click OK to complete the resource creation.
----End
4.3.3 Creating a CDM Resource
Scenario
DPS allows you to create a CDM cluster immediately, or create or delete a CDM cluster ondemand or at a specific point in time. Reasonable DPS configurations help you fully utilizeCDM clusters while reducing usage cost.
CDM clusters in the resource management list can be used by ExecuteCDM activities.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l The current number of resources does not reach the resource quota. The resource quota is
20.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Resource List.
Step 4 On the Resource List page, click Create Resource.
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
53
Step 5 In the dialog box that is displayed, select CDM from the Type drop-down list. Configure theCDM resource parameters by referring to Table 4-14.
Table 4-14 CDM resource parameters
Parameter Mandatory or Not
Description Example
Schedule Type Yes Resource scheduling type. Options are asfollows:l On Demand: Resources are automatically
created and deleted based on the resourceusage of the pipeline.
l At Once: Resources are created at once. Ifthe resources are no longer used, you needto manually delete them.
l On Schedule: Resources are automaticallyscheduled, created, and deleted at yourspecified resource schedule cycle, creationtime, and deletion time. Ensure that theinterval between the resource creationtime and the resource deletion time islonger than the resource schedule cycle.
-
Resource Name Yes Resource name.A resource name is 1 to 64 characters longand contains only letters, digits, hyphens (-),and underscores (_).
cdm_source
Cluster Name Yes CDM cluster name.A CDM cluster name is 4 to 64 characterslong, contains only letters, digits, underscores(_), and hyphens (-), and must start with aletter.
cdm-a4d3
Version Yes CDM service version.Currently, only CDM 1.0.8T is supported.
1.0.8T
VPC Yes VPC in which the CDM cluster is created.Ensure that the CDM cluster and the datasource to which the CDM cluster isconnected are in the same VPC.
vpc-cdm-source
Subnet Yes Subnet of the CDM cluster. Ensure that thesubnet of the CDM cluster can communicatewith that of the data source.
subnet-cdm-source
Security Group Yes Security group to which the CDM clusterbelongs. Ensure that the security group canaccess the data source.
sg-cdm
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
54
Parameter Mandatory or Not
Description Example
NodeConfiguration
Yes The following two types of nodespecifications are available:l cdm.medium: ECS server with 4-core
CPU and 8 GB memory, which is suitablefor a single database table with less than10 million records.
l cdm.large: ECS server with 8-core CPUand 16 GB memory, which is suitable fora single database table with more than 10million records.
cdm.medium
EIP Yes An indication of whether to bind an EIP tothe CDM cluster.If the CDM cluster needs to access a datasource on the Internet, bind an EIP to theCDM cluster.
AutomaticallyAssign
AZ Yes AZ in which the CDM cluster is created. cn-north-1b
Step 6 Click OK to complete the resource creation.
----End
4.3.4 Editing a Resource
Scenario
You can modify a resource that has been created if necessary.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l Resources have been created.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Resource List.
Step 4 On the Resource List page, click Edit in the Operation column for the resource to be edited.
Step 5 On the editing page, modify the resource information. For the description of each parameter,see the following:
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
55
l DIS resource: Table 4-11l MRS resource: Table 4-13l CDM resource: Table 4-14
Step 6 Click OK. The modified parameter settings are saved successfully.
----End
4.3.5 Deleting a Resource
ScenarioYou are advised to delete a resource if the resource will not be used any longer to reduce thequota usage.
Prerequisitesl The account who wants to access the public cloud management console has permission
on accessing DPS. For details Permissions Required for Accessing DPS.l Resources have been created but have not been used by pipelines.
Procedure
Step 1 Log in to the DPS console.
Step 2 Click in the upper left corner and select your region and project.
Step 3 In the navigation pane of the DPS console, click Resource List.
Step 4 On the Resource List page, click Delete in the Operation column for the resource to bedeleted.
Step 5 In the displayed dialog box, click OK to delete the resource.
----End
Data Pipeline ServiceUser Guide 4 Working With DPS
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
56
5 Configuration Guide
5.1 Data SourcesA data source such as OBS, RDS, and HDFS indicates the location where data is stored.
5.1.1 RDS
FunctionThe RDS data source indicates MySQL of RDS, and is used to store user data in the form oftables.
ConfigurationOn the Edit page, drag and drop the RDS data source to the edit grid area. Click the RDS datasource.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items shown in Table 5-1.
Table 5-1 RDS properties
Property Mandatory or Not
Description Example Value
Name Yes Data source name. RDS_4171
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
57
Property Mandatory or Not
Description Example Value
Database Yes Select a suitable connector from theconnector list and use it as the database.
To create a connector, click or go tothe DPS connector list page forcreation. For details about parametersettings, see Creating a DataSourceConnector.
DBname
Table Name Yes Name of the RDS table.You need to create the RDS table inadvance.Statement for creating a table: create'test','d'
test
5.1.2 HBase
Function
The HBase data source indicates the HBase distributed cloud storage system of MRS, and isapplicable to massive data storage.
Configuration
On the Edit page, drag and drop the HBase data source to the edit grid area. Click the HBasedata source.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items shown in Table 5-2.
Table 5-2 HBase properties
Property Mandatory or Not
Description Example Value
Name Yes Data source name. HBase_4653
HBASE TableName
Yes Name of the HBase table.You need to create the HBase tablein advance.Statement for creating a table:create 'test','d'
datacsv2
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
58
Property Mandatory or Not
Description Example Value
HBASEColumns
No Columns of the HBase table. Thecolumns must be created inadvance.
HBASE_ROW_KEY,d:c2,d:c3
5.1.3 HDFS
FunctionThe HDFS data source indicates the Hadoop distributed file system of MRS, and is applicableto large-scale data storage.
ConfigurationOn the Edit page, drag and drop the HDFS data source to the edit grid area. Click the HDFSdata source.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items shown in Table 5-3.
Table 5-3 HDFS properties
Property Mandatory or Not
Description Example Value
Name Yes Data source name. HDFS_0745
MR Cluster Yes MR cluster.To create a cluster, perform thefollowing operations:
l Click to create an MRS clusteras required. For details aboutparameter settings, see Creating anMRS Resource.
l Go to the DPS resourcemanagement list page and create anMRS cluster.
l Go to the MRS managementconsole and create an MRS cluster.
DPS_using_mrs
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
59
Property Mandatory or Not
Description Example Value
HDFS Path Yes Storage path of the HDFS file.When HDFS is used as an output datasource, HDFS Path supports thefollowing variables:l <scheduletime>: This indicates that
a directory named after the time atwhich the pipeline starts runningwill be automatically created forstoring pipeline output data.
l <date>: This indicates that adirectory named after the currentdate will be automatically createrdfor storing pipeline output data.
l <yesterday>: This indicates that adirectory named after the previousday will be automatically createdfor storing pipeline output data.
/user/omn/<scheduletime>
5.1.4 OBS
FunctionThe OBS data source indicates the data storage function of OBS, and is used to storeunstructured data, including documents, images, and videos.
ConfigurationOn the Edit page, drag and drop the OBS data source to the edit grid area. Click the OBS datasource.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items shown in Table 5-4.
Table 5-4 OBS properties
Property Mandatory or Not
Description Example Value
Name Yes Data source name. OBS_5680
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
60
Property Mandatory or Not
Description Example Value
OBS Path Yes Path to the directory where OBSdata is stored.When OBS is used as an outputdata source, OBS Path supportsthe following variables:l <scheduletime>: This indicates
that a directory named after thetime at which the pipeline startsrunning will be automaticallycreated for storing pipelineoutput data.
l <date>: This indicates that adirectory named after thecurrent date will beautomatically createrd forstoring pipeline output data.
l <yesterday>: This indicates thata directory named after theprevious day will beautomatically created forstoring pipeline output data.
s3a://dpsfile/<scheduletime>/
5.1.5 DWS
Function
The DWS data source indicates the data storage function of DWS.
Configuration
On the Edit page, drag and drop the DWS data source to the edit grid area. Click the DWSdata source.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items shown in Table 5-5.
Table 5-5 DWS properties
Property Mandatoryor Not
Description Example Value
Name Yes Data source name. DWS_0167
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
61
Property Mandatoryor Not
Description Example Value
Database Yes Select a suitable connector fromthe connector list and use it asthe database.
To create a connector, click or go to the DPS connector listpage for creation. For detailsabout parameter settings, seeCreating a DataSourceConnector.
dws_test
Table Name Yes Name of the database table. Thetables must be created inadvance.
tabledws
5.1.6 CDM Source
Function
CDM Source indicates that CDM is used as the input or output data source of pipelines.
Configuration
On the Edit page, drag and drop the CDM Source data source to the edit grid area. Click theCDM Source data source.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items shown in Table 5-6.
Table 5-6 CDM Source properties
Property Mandatoryor Not
Description ExampleValue
Name Yes Data source name. CDM_Source_0167
CDM SourceConnector
Yes Select a suitable CDM connector from theconnector list, and use it as the CDM datasource.
To create a connector, click or go tothe DPS connector list page for creation.For details about parameter settings , seeCreating a CDM Connector.
cdm_s11
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
62
Property Mandatoryor Not
Description ExampleValue
CDM DBSchema
Yes Name of the database schema.This parameter is displayed only whenLink Type is set to generic-jdbc-connector during the creation of the CDMconnector.
columncdm
DatabaseTable Name
Yes Name of the database table. You need tocreate the database table in advance.This parameter is displayed only whenLink Type is set to generic-jdbc-connector during the creation of the CDMconnector.
tablecdm
OBS Path Yes Path to the directory where OBS data isstored.When CDM Source is used as the outputdata source of the ExecuteCDM activity,this parameter can be set to a directory thatdoes not exist in the OBS bucket, and thisdirectory will be automatically created bythe ExecuteCDM activity. However, if thisparameter is set to an OBS bucket thatdoes not exist, the ExecuteCDM activityfails to be executed.This parameter is displayed only whenLink Type is set to obs-connector duringthe creation of the CDM connector.
s3a://dpsfile/
5.1.7 Dummy
Function
Dummy is a data source that does not store any data. The Dummy data source is connected toan activity that does not require an input or output data source.
Configuration
On the Edit page, drag and drop the Dummy data source to the edit grid area. Click theDummy data source.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area,configure the data source name.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
63
5.1.8 UQuery Table
Function
UQuery Table indicates the tables supported by UQuery, such as the UQuery tables and OBStables, and is used together with UQuery<->OBS and UQuery SQL activities. For details, seeUQuery<->OBS and UQuery SQL.
Configuration
On the Edit page, drag and drop the UQuery Table data source to the edit grid area. Click theUQuery Table data source.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items shown in Table 5-7.
Table 5-7 UQuery Table properties
Property Mandatoryor Not
Description ExampleValue
Name Yes Data source name. UQuery_Table_9622
DatabaseName
Yes Name of a created UQuery database.l Select a created UQuery database.l If you select Create, an input box is
displayed for you to enter the name ofthe database to be created.
uquery_db
Table Type Yes Location where data is stored.l OBS: Data is stored in OBS buckets.l UQuery: Data is stored in UQuery.l View: Data table view.This parameter is displayed when youselect a created database for DatabaseName.
OBS
Table Name Yes Name of a created UQuery data table.l Select a created UQuery data table.l If you select Create, an input box is
displayed for you to enter the name ofthe data table to be created.
This parameter is displayed when youselect a created database for DatabaseName.
uquery_table
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
64
5.1.9 ES Storage
FunctionES Storage indicates the data sources of ES, and is used together with the Elasticsearchactivity. For details, see Elasticsearch.
ConfigurationOn the Edit page, drag and drop the ES Storage data source to the edit grid area. Click the ESStorage data source.
l On the Input and Output tab pages at the left side of the edit grid area, check theactivities to which the data source can connect when it serves as an input or output datasource.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items shown in Table 5-8.
Table 5-8 ES Storage properties
Property Mandatoryor Not
Description ExampleValue
Name Yes Data source name. ES_Storage_7212
ESConnector
Yes Select a suitable ESSource connector fromthe connector list, and use it as the ES datasource.
To create a connector, click or go tothe DPS connector list page for creation.For details about parameter settings, seeCreating an ESSource Connector.
dps_es
5.2 ActivitiesAn activity defines the move or transfer operation performed for data. For example, theHDFS<->OBS activity can move data from OBS to HDFS.
5.2.1 HDFS->HBASE
FunctionThe HDFS->HBASE activity is used to import data files (in the format of CSV) stored inHDFS to HBase tables.
ConfigurationOn the Edit page, drag and drop the HDFS->HBASE activity to the edit grid area. Click theHDFS->HBASE activity.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
65
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-9.
Table 5-9 Link relationship between the HDFS->HBASE activity and the data sources
Activity Link Relationship
HDFS->HBASE HDFS -> [HDFS->HBASE] -> HBase
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
ParametersProperties
Table 5-10 describes the HDFS->HBASE properties.
NOTE
For a common user, you do not need to configure Customer Jar File Path, Execution Class Name, andExtension Parameter File Path. If you want to import a custom data process logic, configure thesethree items.
Table 5-10 HDFS->HBASE properties
Property Mandatory orNot
Description Example Value
Name Yes Activity name. HDFS_HBASE_3904
MR Cluster Yes MR cluster.To create a cluster, performthe following operations:
l Click to create anMRS cluster asrequired. For detailsabout parametersettings, see Creatingan MRS Resource.
l Go to the DPS resourcemanagement list pageand create an MRScluster.
l Go to the MRSmanagement consoleand create an MRScluster.
DPS_using_mrs
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
66
Property Mandatory orNot
Description Example Value
Load Type Yes Data loading type.Possible values:l BULKLOAD: Load a
large amount of data toHBase.
l INSERT: Load a fewdata to HBase.
BULKLOAD orINSERT
File Backup Yes Whether to back up theloaded files.
Yes or No
Backup Path Yes Path to the directory whereloaded files are backed up.
/user/omm/
Custom Jar File Path No Path to the custom Jarpackage.
/user/omm/yu/loadhbase/customjar/customer.jar
Execution Class Name No Name of the executionclass.
com.company.datacraft.hbase.ImportTsvCustom
Execution ParameterFile Path
No Path to the extensionparameter file.
/user/omm/yu/loadhbase/arg/b.txt
Log Path Yes Path to the directory wherelogs are stored.Log path supports thefollowing variables:l <scheduletime>: This
indicates that adirectory named afterthe time at which thepipeline starts runningwill be automaticallycreated for storing logfiles.
l <date>: This indicatesthat a directory namedafter the current datewill be automaticallycreated for storing logfiles.
l <yesterday>: Thisindicates that adirectory named afterthe previous day will beautomatically createdfor storing log files.
s3a://dps/log/
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
67
Precondition
Table 5-11 describes the parameters of an HDFS->HBASE precondition.
NOTE
A maximum of five preconditions can be added.
Table 5-11 Precondition parameters
Parameter Mandatory or Not
Description
Action AfterPrecondition CheckFailure
Yes Action that will be performed if the precondition isnot met.Options are as follows:l Exit: Exit the activity.l Continue: Execute the activity.
Check Method Yes Options are as follows:l Meet all types: The system executes the activity
only when all preconditions must be met.l Meet any type: The system executes the activity
if any precondition is met.
Precondition Type Yes Type of the precondition.Options are as follows:l Check whether the file exists.l Check the number of files in the folder.l Check the file size.
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-12 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
68
Table 5-12 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.2 HDFS<->OBS
Function
The HDFS<->OBS activity uses MapReduce jobs to implement distributed file copy, andtransfer data between HDFS and OBS.
Configuration
On the Edit page, drag and drop the HDFS<->OBS activity to the edit grid area. Click theHDFS<->OBS activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-13.
Table 5-13 Link relationships between the HDFS<->OBS activity and the data sources
Activity Link Relationship
HDFS<->OBS OBS -> [HDFS<->OBS] -> HDFSHDFS -> [HDFS<->OBS] -> OBS
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items in the following section.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
69
Parameters
Properties
Table 5-14 describes the HDFS<->OBS properties.
Table 5-14 HDFS<->OBS properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. HDFS_OBS_3603
MR Cluster Yes MR cluster.To create a cluster, perform thefollowing operations:
l Click to create an MRScluster as required. Fordetails about parametersettings, see Creating anMRS Resource.
l Go to the DPS resourcemanagement list page andcreate an MRS cluster.
l Go to the MRS managementconsole and create an MRScluster.
DPS_using_mrs
Job Name Yes MRS Job name. HDFS_OBS
Log Path No Path to the directory where logsare stored.
s3a://dps/log/
Precondition
Table 5-15 describes the parameters of an HDFS<->OBS precondition.
NOTE
A maximum of five preconditions can be added.
Table 5-15 Precondition parameters
Parameter Mandatory or Not
Description
Action AfterPrecondition CheckFailure
Yes Action that will be performed if the precondition isnot met.Options are as follows:l Exit: Exit the activity.l Continue: Execute the activity.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
70
Parameter Mandatory or Not
Description
Check Method Yes Options are as follows:l Meet all types: The system executes the activity
only when all preconditions must be met.l Meet any type: The system executes the activity
if any precondition is met.
Precondition Type Yes Type of the precondition.Options are as follows:l Check whether the file exists.l Check the number of files in the folder.l Check the file size.
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-16 describes the advanced parameters.
Table 5-16 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
71
5.2.3 Database<->HDFS
FunctionThe Database<->HDFS activity is used to transfer data between HDFS and RDS.
ConfigurationOn the Edit page, drag and drop the Database<->HDFS activity to the edit grid area. Click theDatabase<->HDFS activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-17.
Table 5-17 Link relationships between the Database<->HDFS activity and the datasources
Activity Link Relationship
Database<->HDFS HDFS -> [Database<->HDFS] -> RDSRDS -> [Database<->HDFS] -> HDFS
NOTE
In pipeline RDS -> [Database<->HDFS] -> HFDS, ensure that the directory specified by theHDFS Path property of the HDFS data source does not exist.
For example:
If HDFS Path is set to /user/omm/yourfile, /user/omm indicates an existing directory, andyourfile is a user-defined directory and does not exist in the /user/omm directory.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items in the following section.
ParametersProperties
Table 5-18 describes the Database<->HDFS properties.
Table 5-18 Database<->HDFS properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. Database_HDFS_5690
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
72
Property Mandatory or Not
Description Example Value
MR Cluster Yes MR cluster.To create a cluster, perform thefollowing operations:
l Click to create an MRScluster as required. For detailsabout parameter settings, seeCreating an MRS Resource.
l Go to the DPS resourcemanagement list page andcreate an MRS cluster.
l Go to the MRS managementconsole and create an MRScluster.
DPS_using_mrs
Job Name Yes MRS Job name. RDS_HDFS
Database<->HDFS JobParameters
No Sqoop command parameters.NOTE
The Sqoop component is used in theDatabase<->HDFS activity. Enter theSqoop command parameters here.
-m 1: starts a Mapprocess to executethe task.
Log path Yes Path to the directory where logsare stored.Log path supports the followingvariables:l <scheduletime>: This
indicates that a directorynamed after the time at whichthe pipeline starts running willbe automatically created forstoring log files.
l <date>: This indicates that adirectory named after thecurrent date will beautomatically created forstoring log files.
l <yesterday>: This indicatesthat a directory named afterthe previous day will beautomatically created forstoring log files.
s3a://dps/log/
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
73
Property Mandatory or Not
Description Example Value
HDFSSubdirectory
No Subdirectory under the directoryspecified by the HDFS Pathproperty of the HDFS data source.This parameter is used to specifythe path to the input data sourceof the Database<->HDFS activityonly when HDFS is used as theinput data source.NOTE
HDFS Path is a property of theHDFS data source. For details, seeTable 5-3.
/chd
Precondition
Table 5-19 describes the parameters of an Database<->HDFS precondition.
NOTE
A maximum of five preconditions can be added.
Table 5-19 Precondition parameters
Parameter Mandatory or Not
Description
Action AfterPrecondition CheckFailure
Yes Action that will be performed if the precondition isnot met.Options are as follows:l Exit: Exit the activity.l Continue: Execute the activity.
Check Method Yes Options are as follows:l Meet all types: The system executes the activity
only when all preconditions must be met.l Meet any type: The system executes the activity
if any precondition is met.
Precondition Type Yes Type of the precondition.Options are as follows:l Check whether the file exists.l Check the number of files in the folder.l Check the file size.l Check whether the database table exists.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
74
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-20 describes the advanced parameters.
Table 5-20 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.4 UQuery<->OBS
FunctionThe UQuery<->OBS activity is used to transfer data between the OBS bucket and UQuerytable. This activity supports data transfers only through CSV files.
ConfigurationOn the Edit page, drag and drop the UQuery<->OBS activity to the edit grid area. Click theUQuery<->OBS activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-21.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
75
Table 5-21 Link relationship between the UQuery<->OBS activity and the data sources
Activity Link Relationship
UQuery<->OBS OBS -> [UQuery<->OBS] -> UQuery TableUQuery Table -> [UQuery<->OBS] -> OBSNOTE
When data is transferred from the UQuery Table data source to theOBS data source, a folder is automatically created in OBS to store thetransferred data. Ensure that the folder (specified by the OBS Pathproperty of the OBS data source) does not exist in the OBS bucket.
For example, if OBS Path is set to s3a://dpsfile/new/, new is thefolder to be automatically created and ensure that this folder does notexist in the s3a://dpsfile/ directory.
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
ParametersProperties
Table 5-22 describes the UQuery<->OBS properties
Table 5-22 UQuery<->OBS properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. UQuery_OBS_5844
Import/Export Yes Data transfer direction.l When data is
transferred from theOBS data source to theUQuery Table datasource, selectImport(OBS->UQuery).
l When data istransferred from theUQuery Table datasource to the OBS datasource, selectExport(UQuery->OBS).
Import
Log Path No Path of the execution log. s3a://dps/log/
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
76
Property Mandatory or Not
Description Example Value
File Format Yes Format of the file used totransfer data between theUQuery Table data sourceand the OBS data source.Currently, only CSV filesare supported.
csv
CompressionFormat
Yes File compression type.Options are as follows:l Not compressl gzipl bzip2l deflateThis parameter isdisplayed when Import/Export is set toExport(UQuery->OBS).
none
Advanced Options No Format of the targetcustom data table.This parameter isdisplayed when Import/Export is set toImport(OBS->UQuery).
-
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-23 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
77
Table 5-23 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.5 CDM Job
FunctionThe CDM Jog activity is used to execute a CDM job that has been created in CDM.
ConfigurationOn the Edit page, drag and drop the CDM Job activity to the edit grid area. Click the CDMJob activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-24.
Table 5-24 Link relationship between the CDM Job activity and the data sources
Activity Link Relationship
CDM Job Any data source -> CDM Job -> any data sourceThe CDM Job activity can be connected to the Shell Script,CDM Job, Create OBS, and Delete OBS.NOTE
Connecting the CDM Job activity to a data source or activity is onlyused to form a complete pipeline. This indicates that running the CDMJob activity does not affect the connected data source or activity.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
78
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items in the following section.
ParametersProperties
Table 5-25 describes the CDM Job properties
Table 5-25 CDM Job properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. CDM_Job_1829
CDM ClusterName
Yes Cluster to which the CDMjob belongs.
cdm-dps
CDM Job Name Yes CDM job name. cdmjobt
Log Path No Path of the log. s3a://dps/log/
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-26 describes the advanced parameters.
Table 5-26 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
79
5.2.6 ExecuteCDM
FunctionThe ExecuteCDM activity creates jobs in the CDM cluster, and migrates cloud data byexecuting the jobs.
ConfigurationOn the Edit page, drag and drop the ExecuteCDM activity to the edit grid area. Click theExecuteCDM activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-27.
Table 5-27 Link relationship between the ExecuteCDM activity and the data sources
Activity Link Relationship
ExecuteCDM CDM Source/OBS -> ExecuteCDM -> CDM Source/OBS
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
ParametersProperties
Table 5-28 describes the ExecuteCDM properties
Table 5-28 ExecuteCDM properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. ExecuteCDM_2113
CDM Job Name Yes Name of a new CDM job.The name contains onlyletters and digits and is notlonger than 21 characters.
cdmjob
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
80
Property Mandatory or Not
Description Example Value
CDM ClusterName
Yes Cluster to which the CDMjob belongs.To create a cluster,perform the followingoperations:
l Click to create aCDM cluster asrequired. For detailsabout parametersettings, see Creatinga CDM Resource.
l Go to the DPS resourcemanagement list pageand create a CDMcluster.
l Go to the CDMmanagement consoleand create a CDMcluster.
cdm-dps
Log Path No Path of the log. s3a://dps/log/
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-29 describes the advanced parameters.
Table 5-29 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
81
Parameter Mandatory orNot
Description
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.7 Spark
FunctionThe Spark activity is used to execute the predefined Spark job on MRS.
ConfigurationOn the Edit page, drag and drop the Spark activity to the edit grid area. Click the Sparkactivity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-30.
Table 5-30 Link relationship between the Spark activity and the data sources
Activity Link Relationship
Spark OBS/HDFS/Dummy -> Spark -> OBS/HDFS/Dummy
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
ParametersProperties
Table 5-31 describes the Spark properties.
Table 5-31 Spark properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. Spark_2350
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
82
Property Mandatory or Not
Description Example Value
MR Cluster Yes MR cluster.To create a cluster,perform the followingoperations:
l Click to create anMRS cluster asrequired. For detailsabout parametersettings, seeCreating an MRSResource.
l Go to the DPSresourcemanagement listpage and create anMRS cluster.
l Go to the MRSmanagement consoleand create an MRScluster.
DPS_using_mrs
Job Name Yes MRS Job name. Spark
Jar File Path Yes Path to the Jar packageof the Spark job.
s3a://dpsfile/program/spark-test.jar
Jar FileParameters
No Variables required forexecuting the Jarpackage.
com.spark.test.JavaWordCountWithSave
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
83
Property Mandatory or Not
Description Example Value
Log path Yes Path to the directorywhere logs are stored.Log path supports thefollowing variables:l <scheduletime>:
This indicates that adirectory namedafter the time atwhich the pipelinestarts running will beautomaticallycreated for storinglog files.
l <date>: Thisindicates that adirectory namedafter the current datewill be automaticallycreated for storinglog files.
l <yesterday>: Thisindicates that adirectory namedafter the previousday will beautomaticallycreated for storinglog files.
s3a://dpsfile/log/<scheduletime>/
Precondition
Table 5-32 describes the parameters of a Spark precondition.
NOTE
A maximum of five preconditions can be added.
Table 5-32 Precondition parameters
Parameter Mandatory or Not
Description
Action AfterPrecondition CheckFailure
Yes Action that will be performed if the precondition isnot met.Options are as follows:l Exit: Exit the activity.l Continue: Execute the activity.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
84
Parameter Mandatory or Not
Description
Check Method Yes Options are as follows:l Meet all types: The system executes the activity
only when all preconditions must be met.l Meet any type: The system executes the activity
if any precondition is met.
Precondition Type Yes Type of the precondition.Options are as follows:l Check whether the file exists.l Check the number of files in the folder.l Check the file size.
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-33 describes the advanced parameters.
Table 5-33 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
85
5.2.8 SparkSQL
FunctionThe SparkSQL activity is used to execute the predefined SparkSQL statements on MRS.
ConfigurationOn the Edit page, drag and drop the SparkSQL activity to the edit grid area. Click theSparkSQL activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-34.
Table 5-34 Link relationship between the SparkSQL activity and the data sources
Activity Link Relationship
SparkSQL Any data source -> SparkSQL -> any data sourceNOTE
Connecting the SparkSQL activity to a data source is only used to forma complete pipeline. This indicates that running the SparkSQL activitydoes not affect the connected data source.
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
ParametersProperties
Table 5-35 describes the SparkSQL properties.
Table 5-35 SparkSQL properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. SparkSQL_4667
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
86
Property Mandatory or Not
Description Example Value
MR Cluster Yes MR cluster.To create a cluster,perform thefollowingoperations:
l Click tocreate an MRScluster asrequired. Fordetails aboutparametersettings, seeCreating anMRS Resource.
l Go to the DPSresourcemanagement listpage and createan MRS cluster.
l Go to the MRSmanagementconsole andcreate an MRScluster.
DPS_using_mrs
Job Name Yes MRS Job name. sparkSql
Statements Yes Spark SQLstatements to beexecuted.Statements areseparated bysemicolon (;).
show tables;
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-36 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
87
Table 5-36 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.9 Hive
Function
The Hive activity is used to execute the predefined Hive script files on MRS.
Configuration
On the Edit page, drag and drop the Hive activity to the edit grid area. Click the Hive activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-37.
Table 5-37 Link relationship between the Hive activity and the data sources
Activity Link Relationship
Hive OBS/HDFS/Dummy -> Hive -> OBS/HDFS/Dummy
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
Parameters
Properties
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
88
Table 5-38 describes the Hive properties.
Table 5-38 Hive properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. Hive_2909
MR Cluster Yes MR cluster.To create a cluster, performthe following operations:
l Click to create anMRS cluster asrequired. For detailsabout parametersettings, see Creatingan MRS Resource.
l Go to the DPS resourcemanagement list pageand create an MRScluster.
l Go to the MRSmanagement consoleand create an MRScluster.
DPS_using_mrs
Job Name Yes MRS Job name. Hive
Hive Script Path Yes Path to the Hive script. s3a://dpsfile/program/hivescript.sql
Script Parameters No Variables required forexecuting the Hive script.
By default, this property isleft blank.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
89
Property Mandatory or Not
Description Example Value
Log path Yes Path to the directory wherelogs are stored.Log path supports thefollowing variables:l <scheduletime>: This
indicates that adirectory named afterthe time at which thepipeline starts runningwill be automaticallycreated for storing logfiles.
l <date>: This indicatesthat a directory namedafter the current datewill be automaticallycreated for storing logfiles.
l <yesterday>: Thisindicates that adirectory named afterthe previous day will beautomatically createdfor storing log files.
s3a://dpsfile/log/
Precondition
Table 5-39 describes parameters of a Hive precondition.
NOTE
A maximum of five preconditions can be added.
Table 5-39 Precondition parameters
Parameter Mandatory or Not
Description
Action AfterPrecondition CheckFailure
Yes Action that will be performed if the precondition isnot met.Options are as follows:l Exit: Exit the activity.l Continue: Execute the activity.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
90
Parameter Mandatory or Not
Description
Check Method Yes Options are as follows:l Meet all types: The system executes the activity
only when all preconditions must be met.l Meet any type: The system executes the activity
if any precondition is met.
Precondition Type Yes Type of the precondition.Options are as follows:l Check whether the file exists.l Check the number of files in the folder.l Check the file size.
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-40 describes the advanced parameters.
Table 5-40 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
91
5.2.10 MapReduce
Function
The MapReduce activity is used to run the predefined MapReduce program on MRS.
Configuration
On the Edit page, drag and drop the MapReduce activity to the edit grid area. Click theMapReduce activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-41.
Table 5-41 Link relationship between the MapReduce activity and the data sources
Activity Link Relationship
MapReduce OBS/HDFS/Dummy -> MapReduce -> OBS/HDFS/Dummy
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items in the following section.
Parameters
Properties
Table 1 MapReduce properties describes the MapReduce properties.
Table 5-42 MapReduce properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. MapReduce_8300
MR Cluster Yes MR cluster.To create a cluster, perform thefollowing operations:
l Click to create an MRScluster as required. Fordetails about parametersettings, see Creating anMRS Resource.
l Go to the DPS resourcemanagement list page andcreate an MRS cluster.
l Go to the MRS managementconsole and create an MRScluster.
DPS_using_mrs
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
92
Property Mandatory or Not
Description Example Value
Job Name Yes MRS Job name. MR
Jar File Path Yes Path to the Jar package. s3a://dpsfile/program/hadoop-mapreduce-examples-2.7.1.jar
Jar FileParameters
No Variables required for executingthe Jar package.
wordcount
Log path Yes Path to the directory where logsare stored.Log path supports the followingvariables:l <scheduletime>: This
indicates that a directorynamed after the time at whichthe pipeline starts runningwill be automatically createdfor storing log files.
l <date>: This indicates that adirectory named after thecurrent date will beautomatically created forstoring log files.
l <yesterday>: This indicatesthat a directory named afterthe previous day will beautomatically created forstoring log files.
s3a://dpsfile/log/
Precondition
Table 5-43 describes the parameters of a MapReduce precondition.
NOTE
A maximum of five preconditions can be added.
Table 5-43 Precondition parameters
Parameter Mandatory or Not
Description
Action AfterPrecondition CheckFailure
Yes Action that will be performed if the precondition isnot met.Options are as follows:l Exit: Exit the activity.l Continue: Execute the activity.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
93
Parameter Mandatory or Not
Description
Check Method Yes Options are as follows:l Meet all types: The system executes the activity
only when all preconditions must be met.l Meet any type: The system executes the activity
if any precondition is met.
Precondition Type Yes Type of the precondition.Options are as follows:l Check whether the file exists.l Check the number of files in the folder.l Check the file size.
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-44 describes the advanced parameters.
Table 5-44 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
94
5.2.11 Shell Script
Function
The Shell Script activity is used to execute the shell scripts specified by users in the ECSserver.
Configuration
On the Edit page, drag and drop the Shell Script activity to the edit grid area. Click the ShellScript activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-45.
Table 5-45 Link relationships between the Shell Script activity and the data sources
Activity Link Relationship
Shell Script Any data source -> Shell Script -> any data sourceThe Shell Script activity can be connected to the Shell Script, CDMJob, Create OBS, and Delete OBS.NOTE
Connecting the Shell Script activity to a data source or activity is only usedto form a complete pipeline. This indicates that running the Shell Scriptactivity does not affect the connected data source or activity.
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items in the following section.
Parameters
Properties
Table 5-46 describes the Shell Script properties
Table 5-46 Shell Script properties
Property Mandatoryor Not
Description ExampleValue
Name Yes Activity name. ShellScript_9167
ComputeResource
Yes Name of the DPS Agent that hasbeen registered in the ECS server.NOTE
If the installed DPS Agent is notavailable, contact technical support.
test
Script Path Yes Absolute path to the shell script onthe ECS server.
/tmp/test.sh
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
95
Property Mandatoryor Not
Description ExampleValue
Log BackupRequired
Yes An indication of whether to backup logs.
True
Log path No Log backup directory.This parameter is required onlywhen Log Backup Required is setto True.Log path supports the followingvariables:l <scheduletime>: This indicates
that a directory named after thetime at which the pipeline startsrunning will be automaticallycreated for storing log files.
l <date>: This indicates that adirectory named after thecurrent date will beautomatically created forstoring log files.
l <yesterday>: This indicates thata directory named after theprevious day will beautomatically created forstoring log files.
s3a://dpsfile/log/
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-47 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
96
Table 5-47 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.12 MachineLearning
Function
The MachineLearning activity is used to execute the workflows of Machine Learning Service(MLS).
Configuration
On the Edit page, drag and drop the MachineLearning activity to the edit grid area. Click theMachineLearning activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-48.
Table 5-48 Link relationship between the MachineLearning activity and the data sources
Activity Link Relationship
MachineLearning HDFS/Dummy -> MachineLearning -> HDFS/DummyNOTE
Connecting the MachineLearning activity to a data source is only usedto form a complete pipeline. This indicates that running theMachineLearning activity does not affect the connected data source.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
97
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items in the following section.
ParametersProperties
Table 5-49 describes the MachineLearning properties
Table 5-49 MachineLearning properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. MachineLearning_2113
MLS InstanceName
Yes Name of the MLSinstance.To create an instance,click Create MLSInstance or go to the MLSmanagement console forcreation.
mls-7da7
MLS Project Name Yes Name of the MLS project. projectname
MLS WorkflowName
Yes Name of the MLSworkflow.
test
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-50 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
98
Table 5-50 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.13 Elasticsearch
Function
The Elasticsearch activity is used to execute ES requests (GET, PUT, POST, HEAD, andDELETE requests).
Configuration
On the Edit page, drag and drop the Elasticsearch activity to the edit grid area. Click theElasticsearch activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-51.
Table 5-51 Link relationship between the Elasticsearch activity and the data sources
Activity Link Relationship
Elasticsearch OBS/ES Storage/Dummy -> Elasticsearch -> ES StorageES Storage -> Elasticsearch -> OBS/ES Storage/Dummy
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items in the following section.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
99
ParametersProperties
Table 5-52 describes the Elasticsearch properties
Table 5-52 Elasticsearch properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. Elasticsearch_6150
Compute Resource Yes Name of the DPS Agentthat has been registered inthe ECS server.NOTE
l Ensure that the securitygroup of the ES clusterin ES Storage is thesame as the securitygroup of DPS Agent.
l If the installed DPSAgent is not available,contact technicalsupport.
test
Request Type Yes Request to be executed.Options are as follows:l GETl POSTl PUTl HEADl DELETE
GET
Request Parameter No Request parameter.For example, if you needto query the dpsdatamapping type of thedps_search index, therequest parameter is asfollows:/dps_search/dpsdata/_search
/dps_search/dpsdata/_search
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
100
Property Mandatory or Not
Description Example Value
Request Body No JSON-format requestbody.
{"query": {"constant_score": {"filter": {"terms": {"price": [200,300]}}}}}
Log path Yes Path to the directorywhere logs are stored.Log path supports thefollowing variables:l <scheduletime>: This
indicates that adirectory named afterthe time at which thepipeline starts runningwill be automaticallycreated for storing logfiles.
l <date>: This indicatesthat a directory namedafter the current datewill be automaticallycreated for storing logfiles.
l <yesterday>: Thisindicates that adirectory named afterthe previous day willbe automaticallycreated for storing logfiles.
s3a://dpsfile/log/
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-53 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
101
Table 5-53 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.14 RDS SQL
Function
The RDS SQL activity transfers SQL statements (DML and DDL SQL statements) to RDS,and RDS then executes the SQL statements.
Configuration
On the Edit page, drag and drop the RDS SQL activity to the edit grid area. Click the RDSSQL activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-54.
Table 5-54 Link relationship between the RDS SQL activity and the data sources
Activity Link Relationship
RDS SQL RDS -> RDS SQL -> RDSNOTE
The input and output RDS data sources must be in the same database.
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
102
ParametersProperties
Table 5-55 describes the RDS SQL properties.
Table 5-55 RDS SQL properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. RDS_SQL_4574
ComputeResource
Yes Running environment of the activity.Options are as follows:l MR Cluster
To create a cluster, perform thefollowing operations:
– Click to create an MRScluster as required. For detailsabout parameter settings, seeCreating an MRS Resource.
– Go to the DPS resourcemanagement list page andcreate an MRS cluster.
– Go to the MRS managementconsole and create an MRScluster.
l ComputeResource: DPS Agent thathas been registered with ECS.
NOTE
l If an MR cluster is selected as therunning environment of the activity,ensure that the security group of RDSis the same as that of the master nodeof the MR cluster.
l If the installed DPS Agent is notavailable, contact technical support.
DPS_using_mrs
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
103
Property Mandatory or Not
Description Example Value
Log path Yes Path to the directory where logs arestored.Log path supports the followingvariables:l <scheduletime>: This indicates that
a directory named after the time atwhich the pipeline starts runningwill be automatically created forstoring log files.
l <date>: This indicates that adirectory named after the currentdate will be automatically createdfor storing log files.
l <yesterday>: This indicates that adirectory named after the previousday will be automatically createdfor storing log files.
s3a://dps/log/
Statements Yes Statements to be executed.Use a semicolon (;) to separatestatements.The following SQL statements aresupported:l CREATE, DROP and ALTER.l INSERT, DELETE, UPDATE, and
CALL.
INSERT INTO testVALUES ('values1',25);
Precondition
Table 5-56 describes parameters of an RDS SQL precondition.
NOTE
l A maximum of five preconditions can be added.
l When Compute Resource is set to ComputeResource, preconditions cannot be configured.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
104
Table 5-56 Precondition parameters
Parameter Mandatory or Not
Description
Action AfterPrecondition CheckFailure
Yes Action that will be performed if the precondition isnot met.Options are as follows:l Exit: Exit the activity.l Continue: Execute the activity.
Check Method Yes Options are as follows:l Meet all types: The system executes the activity
only when all preconditions must be met.l Meet any type: The system executes the activity
if any precondition is met.
Precondition Type Yes Type of the precondition.Options are as follows:l Check whether the database table exists.
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-57 describes the advanced parameters.
Table 5-57 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
105
5.2.15 DWS SQL
FunctionThe DWS SQL activity transfers SQL statements (DML and DDL SQL statements) to DWS,and DWS then executes the SQL statements.
ConfigurationOn the Edit page, drag and drop the DWS SQL activity to the edit grid area. Click the DWSSQL activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-58.
Table 5-58 Link relationship between the DWS SQL activity and the data sources
Activity Link Relationship
DWS SQL DWS -> DWS SQL -> DWS
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
ParametersProperties
Table 5-59 describes the DWS SQL properties.
Table 5-59 DWS SQL properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. DWS_SQL_2113
Compute Resource Yes Name of the DPS Agent thathas been registered in the ECSserver.NOTE
If the installed DPS Agent is notavailable, contact technicalsupport.
test
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
106
Property Mandatory or Not
Description Example Value
Statements Yes SQL statements.Use a semicolon (;) to separatetwo statements.The following SQL statementsare supported:l CREATE, DROP and
ALTER.l INSERT, DELETE,
UPDATE, and CALL.
CREATE TABLEtest9(callee_numbervarchar(20) );
Log BackupRequired
Yes An indication of whether toback up logs.
True
Log path No Log backup directory.This parameter is required onlywhen Log Backup Requiredis set to True.Log path supports thefollowing variables:l <scheduletime>: This
indicates that a directorynamed after the time atwhich the pipeline startsrunning will beautomatically created forstoring log files.
l <date>: This indicates that adirectory named after thecurrent date will beautomatically created forstoring log files.
l <yesterday>: This indicatesthat a directory named afterthe previous day will beautomatically created forstoring log files.
s3a://dpsfile/log/
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-60 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
107
Table 5-60 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.16 UQuery SQL
FunctionThe UQuery SQL activity is used to transfer SQL statements to UQuery to implement clouddata queries.
ConfigurationOn the Edit page, drag and drop the UQuery SQL activity to the edit grid area. Click theUQuery SQL activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-61.
Table 5-61 Link relationship between the UQuery SQL activity and the data sources
Activity Link Relationship
UQuery SQL OBS -> UQuery SQL -> UQuery TableUQuery Table -> UQuery SQL -> UQuery TableNOTE
In pipeline UQuery Table -> UQuery SQL -> UQuery Table, the inputand output UQuery Table data sources must be set to the samedatabase or data table.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
108
l On the configuration page that is displayed at the right side of the edit grid area, viewand edit the configuration items in the following section.
ParametersProperties
Table 5-62 describes the UQuery SQL properties
Table 5-62 UQuery SQL properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. UQuery_SQL_4796
Queue Name Yes Name of a createdUQuery queue.
dps_uquery
Query Yes Only the SQL statementsthat start with CREATE,DROP, ALTER, orINSERT are supported.SQL statements maycontain the followingvariables:l <obspath>: OBS
bucket path.l <tablename>: UQuery
data table name.l <databasename>:
UQuery databasename.
create table <tablename>(id int, name string)using csv options (path'<obspath>');
Log Path No Path of the execution log. s3a://dps/log/
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-63 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
109
Table 5-63 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.17 Create OBS
Function
The Create OBS activity is used to create buckets or directories in OBS.
Configuration
On the Edit page, drag and drop the Create OBS activity to the edit grid area. Click the CreateOBS activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-64.
Table 5-64 Link relationship between the Create OBS activity and the data sources
Activity Link Relationship
Create OBS Any data source -> Create OBS -> any data sourceThe Create OBS activity can be connected to the Shell Script,CDM Job, Create OBS, and Delete OBS.NOTE
Connecting the Create OBS activity to a data source or activity is onlyused to form a complete pipeline. This indicates that running theCreate OBS activity does not affect the connected data source oractivity.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
110
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
Parameters
Properties
Table 5-65 describes the Create OBS properties
Table 5-65 Create OBS properties
Property Mandatoryor Not
Description Example Value
Name Yes Activity name. Create_OBS_2113
OBS Path Yes Path to the OBS bucket ordirectory to be created.l To create a bucket, enter the
OBS bucket name following //.The OBS bucket name must beunique.
l To create an OBS directory,select the location where theOBS directory is to be created,and enter the directory namefollowing the path to thelocation. The directory namemust be unique.
OBS Path supports the followingvariables:l <scheduletime>: This indicates
that an OBS bucket or directorynamed after the time at whichthe pipeline starts running willbe automatically created.
l <date>: This indicates that anOBS bucket or directory namedafter the current date will beautomatically created.
l <yesterday>: This indicates thatan OBS bucket or directorynamed after the previous daywill be automatically created.
s3a://newbucket/<scheduletime>/
Log Path No Path to the directory where logsare stored.
s3a://dps/log/
Advanced settings
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
111
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-66 describes the advanced parameters.
Table 5-66 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
5.2.18 Delete OBS
FunctionThe Delete OBS activity is used to delete buckets or directories in OBS.
ConfigurationOn the Edit page, drag and drop the Delete OBS activity to the edit grid area. Click the DeleteOBS activity.
l On the Input and Output tab pages at the left side of the edit grid area, check the inputand output data sources to which the activity connects, as shown in Table 5-67.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
112
Table 5-67 Link relationship between the Delete OBS activity and the data sources
Activity Link Relationship
Delete OBS Any data source -> Delete OBS -> any data sourceThe Delete OBS activity can be connected to the Shell Script,CDM Job, Create OBS, and Delete OBS.NOTE
Connecting the Delete OBS activity to a data source or activity is onlyused to form a complete pipeline. This indicates that running theDelete OBS activity does not affect the connected data source oractivity.
l On the configuration page that is displayed at the right side of the edit grid area, view
and edit the configuration items in the following section.
ParametersProperties
Table 5-68 describes the Delete OBS properties
Table 5-68 Delete OBS properties
Property Mandatory or Not
Description Example Value
Name Yes Activity name. Delete_OBS_2113
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
113
Property Mandatory or Not
Description Example Value
OBS Path Yes Path to the OBS bucket ordirectory to be deleted.OBS Path supports thefollowing variables:l <scheduletime>: This
indicates that the OBSbucket or directorynamed after the time atwhich the pipelinestarts running will beautomatically deleted.
l <date>: This indicatesthat the OBS bucket ordirectory named afterthe current date will beautomatically deleted.
l <yesterday>: Thisindicates that the OBSbucket or directorynamed after theprevious day will beautomatically deleted.
NOTEIf an OBS bucket ordirectory is deleted, filesstored in it are also deletedand cannot be restored. Ifyou need to retain the filesstored in the bucket ordirectory, back them up inadvance.
s3a://obs-6dc4/<scheduletime>/
Log Path No Path to the directorywhere logs are stored.
s3a://dps/log/
Advanced settings
The advanced settings define the operation policy that takes effect if the activity fails to beexecuted. Table 5-69 describes the advanced parameters.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
114
Table 5-69 Advanced parameter settings
Parameter Mandatory orNot
Description
Retry uponFailure
Yes An indication of whether to re-execute the activity ifthe activity fails to be executed.l Yes: Re-execute the activity. Configure the
following parameters:– Timeout Interval: Timeout interval for activity
execution.– Maximum Retries: Number of retries upon an
execution failure.– Retry Interval (seconds): Interval between
two retries.l No: Do not re-execute the activity.Default value: No.
Failure policy Yes Operation that will be performed if the activity re-execution still fails.l End the current job execution plan.l Proceed to the next job.
Data Pipeline ServiceUser Guide 5 Configuration Guide
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
115
6 FAQs
6.1 What Is DPS?DPS is one of the public cloud services. It helps you easily create and schedule pipelines. DPShas integrated with multiple cloud services, enabling you to conveniently use and transfer datastored in OBS and RDS. DPS allows you to create and schedule MRS-based data process andanalysis tasks.
6.2 Which Services Can DPS Schedule?DPS can schedule the following services:
l OBSl MRSl RDSl ECSl DWSl DISl CDMl MLSl UQueryl ES
6.3 How Many Pipelines Can I Create Using the DPSConsole?
By default, each user can create a maximum of 10 pipelines. If this quota cannot meet yourrequirement, you can apply for a higher quota.
Data Pipeline ServiceUser Guide 6 FAQs
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
116
6.4 What Can DPS Do?l Using DPS, you can customize pipelines through simple drag and drop operations,
schedule the execution of pipelines, define the scripts and policies to be executed in caseof task failures.
l DPS provides multiple data collection and processing methods, freeing you fromcomplex pipeline compilation. This enables you to focus on data processing logic insteadof programming.
l DPS supports connector creation and management. With this function, you can directlyuse a created and configured connector in a pipeline, eliminating the need of duplicateconnector configurations.
l DPS offers pre-packaged templates, facilitating pipeline creation.l DPS supports pipeline file import and export. It allows you to export pipeline files to
your local PC and import pipeline files to create or edit pipelines.l DPS provides the resource management function. Using this function, you can configure
resource management and scheduling tasks to automatically create and delete resources.
6.5 What Is a Pipeline?A pipeline is formed by a series of activities and data sources. Activities indicate actionsperformed on data; data sources indicate the locations of input and output data. Activitieslinked together mean that these activities are executed according to their linking sequence.That is, DPS executes the next activity only after the previous one is completed.
6.6 What Is a Data Source?A data source indicates the location of data processed in a pipeline. For example, an OBS datasource indicates the data stored in OBS.
Data Pipeline ServiceUser Guide 6 FAQs
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
117
A Change History
Release Date What's New
2018-01-30 This issue is the fifth official release.Added the following content:l Creating an ESSource Connectorl Creating a CDM Resourcel UQuery Tablel ES Storagel UQuery<->OBSl MachineLearningl Elasticsearchl UQuery SQLModified the following contents:l Related Servicesl Configuring DPS Agentl Editing a Pipelinel Scheduling a Pipelinel Database<->HDFSl Create OBSl Delete OBS
Data Pipeline ServiceUser Guide A Change History
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
118
Release Date What's New
2017-12-08 This issue is the fourth official release.Added the following content:l Basic Conceptsl Getting Startedl Creating a CDM Connectorl Creating a DIS Resourcel Creating an MRS Resourcel CDM Sourcel Dummyl ExecuteCDMl Create OBSl Delete OBSl CDM JobModified the following contents:l Related Servicesl Obtaining an AK/SK Pairl Configuring DPS Agentl Editing a Pipelinel Monitoring a Pipelinel Sparkl Hivel RDS SQLl HDFS->HBASEl HDFS<->OBSl MapReducel Database<->HDFSl DWS SQLl Which Services Can DPS Schedule?
2017-11-01 This issue is the third official release.Modified the following contents:l Installation Flowl Purchasing Elastic Cloud Server (ECS)l Obtaining an AK/SK Pairl (Optional) Connecting to DWS Cluster
Data Pipeline ServiceUser Guide A Change History
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
119
Release Date What's New
2017-10-27 This issue is the second official release.Added the following content:l Connector Creation and Managementl Resource Creation and Managementl Installing DPS Agentl Exporting a Pipelinel Connector Listl Resource Listl DWSl Shell ScriptModified the following contents:l Pipeline Creation and Managementl Related Servicesl Permissions Required for Accessing DPSl Buying a Pipelinel Editing a Pipelinel Monitoring a Pipelinel RDSl Activities
2017-08-26 This issue is the first official release.
Data Pipeline ServiceUser Guide A Change History
Issue 05 (2018-01-30) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.
120