+ All Categories
Home > Documents > Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before...

Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before...

Date post: 09-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
14
Oracle Enterprise Data Quality Reference Data Writing Release 12.2.1
Transcript
Page 1: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Oracle Enterprise Data Quality

Reference Data Writing

Release 12.2.1

Page 2: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Labs for Reference Data Writing Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 2 of 14

Table of Contents Labs for Reference Data Writing ....................................................................................................... 3

Lab 1: Writing to Reference Data ........................................................................................................ 4

Page 3: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Page 3 of 14 Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Labs for Reference Data Writing

Labs for Reference Data Writing

Page 4: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Labs for Reference Data Writing Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 4 of 14

Lab 1: Writing to Reference Data

Before You Start The lab in this module relies on a specific file being present in the EDQ landingarea. The file that needs to be in the landingarea is called Titles-Names-Genders-90k.jmp.

The landingarea is a sub-directory of the oedq.local.home directory.

The location of oedq.local.home can vary, depending on your

machine’s operating system and the type of installation you have carried out. In the EDQ-12.2.1-Trn virtual machine, the landing area is at

/apps/oracle12_2_1/fmw/user_projects/domains/base_domain/config/fmwconfig/edq/oedq.local.home/landingarea. On a

Windows 7 machine, the default location of oedq_local_home is:

C:\ProgramData\Oracle\Enterprise Data Quality\oedq_local_home.

The Titles-Names-Genders-90k.jmp file is in ~\edq_training_assets_12.2.1\Data Files\fundamentals.

1. If you are using the EDQ-12.2.1-Trn virtual machine , copy the following file from the edq_training_assets_12.2.1 folder to the following folder on the virtual machine: /apps/oracle12_2_1/fmw/user_projects/domains/base_domain/config/fmwconfig/edq/oedq.local.home/landingarea

a) edq_training_assets_12.2.1\Data Files\fundamentals\Titles-Names-

Genders-90k.jmp

To do this, move to the desktop of the virtual machine, and double click the

shortcut to EDQ landingarea folder. The shortcut to EDQ landingarea

window will be displayed. Next, double click the Share folder. Within the Share folder, double click the edq_training_assets_12.2.1 folder, then, within that

folder, double click the Data Files folder and then double click the fundamentals folder. Drag the file listed above from the fundamentals folder

Page 5: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Page 5 of 14 Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Labs for Reference Data Writing

to the shortcut to EDQ landingarea folder.

2. If you are not using the EDQ-12.2.1-Trn virtual machine, copy the following file from the edq_training_assets_12.2.1 folder to your EDQ instance’s landingarea:

a) edq_training_assets_12.2.1\Data Files\fundamentals\Titles-Names-

Genders-90k.jmp

Note that the landingarea is a sub-directory of the oedq_local_home

directory. The location of oedq_local_home can vary, depending on your machine’s operating system and the type of installation you have

carried out. On a Windows 7 machine, the default location of

oedq_local_home is: C:\ProgramData\Oracle\Enterprise Data Quality\oedq_local_home.

Import the Package 1. In EDQ’s Director user interface, follow the menu path File >> Open Package File…

2. Navigate to the Learn Name-Gender Map.dxi file, which you will find in C:\share\edq_training_assets_12.2.1\Data Files\fundamentals. Select the file

Page 6: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Labs for Reference Data Writing Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 6 of 14

and click Open. A folder named Learn Name-Gender Map.dxi should appear at the bottom of the Project Browser.

3. Drag the Projects node from beneath the Learn Name-Gender Map.dxi and drop it on top of the Projects node directly below localhost (localhost is your server name).

4. A Learn Name-Gender Map project should appear in the Project Browser.

5. In the Project Browser,you can now right-click the Learn Name-Gender Map.dxi file and select Close Package File.

Page 7: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Page 7 of 14 Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Labs for Reference Data Writing

Examine the Learn Name-Gender Map Project 1. In the Project Browser, expand the Learn Name-Gender Map project.

2. Within the Learn Name-Gender Map project, double-click the Processes node to expand it, and then double click the process called 1 - Prepare Names, Profile Names & Name-Gender Combinations to open it. The process prepares some customer data, and then profiles:

The frequency with which each customer’s first name appears in the data set. (e.g. the first name ‘Lynn’ appears 100 times in the data set).

Page 8: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Labs for Reference Data Writing Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 8 of 14

The frequency with which each name / gender combination appears in the data set (e.g. 90 of the customers with first name ‘Lynn’ are marked as female and 10 are marked as male).

3. Within the Learn Name-Gender Map project, double-click the process called 2 - Create Name-Gender Map Reference Data to open it. This second process uses the results of the first process’s profiling to mark names as either male or female. By default, names that are associated with a particular gender at least 90% of the time are mapped to that gender, provided that the name appears at least 10 times in the data. (Both of these default thresholds can be overridden using a run profile). At the end of the process, a writer writes the name gender map to a set of reference data that is also called Name-Gender Map.

A second writer additionally writes the name – gender map to staged

data. The only purpose of this is to enable you to see the name – gender map from the Server Console if you choose to start a job

containing this process from that user interface.

Page 9: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Page 9 of 14 Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Labs for Reference Data Writing

4. Within the Learn Name-Gender Map project, double-click the process called 3 - Gender Check Using Learned Gender to open it. This is a real time process that reads from a web service. It accepts First Name as an input. It adds a ‘learned gender’ to the data by doing a lookup on the Name-Gender Map reference data that was created in the process called 2 - Create Name-Gender Map Reference Data. It does this by taking the supplied first name, looking it up in the reference data, and then returning the Gender that the name is mapped to. Finally, the process returns the learned gender to the web service, or indicates that it cannot find the first name in the reference data if that is the case.

5. Finally, explore the project’s reference data, and note that the following reference data sets are currently empty:

Name Frequencies.

Page 10: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Labs for Reference Data Writing Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 10 of 14

Name – Gender Frequencies.

Name-Gender Map.

Just click or double-click a Reference Data set in the Project Browser to view its data (or, in this case, the lack thereof).

Examine the Job 1. In the Project Browser, expand the Jobs node, and then double-click the job called

MAIN - Learn Name-Gender Map and Run Real-Time Checker to open it. The job has two phases, called Call 2a - Shutdown Web Service and Call 2b - Learn Reference Data and Run Web Service respectively. The first phase calls a job that simply shuts down a web service if it is running. The second phase calls a job that does the heavy lifting. We’ll look at this job next.

2. Double-click the job called Part 2b - Learn Reference Data and Run Web Service to open it. The job has three phases.

The first phase is called Snapshot and Frequency Profile Names and Genders. This snapshots some existing customer data and runs the first process, which frequency profiles names and name-gender combinations.

The second phase is called Create Name-Gender Map Reference Data. This phase runs the second process, which creates the Name-Gender Map reference

Page 11: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Page 11 of 14 Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Labs for Reference Data Writing

data.

The third phase is called Run Real-Time Checker. This phase runs the real-time process called 3 - Gender Check Using Learned Gender, which uses the Name-Gender map reference data to add a gender based on a customer’s first name.

Run the Job 1. Close any processes or jobs that are currently open on Director’s Canvas.

2. In the Project Browser, right-click the job called 2 - After - Learn Name-Gender Map and Run Real-Time Checker and select Run. In the Tasks area (at the bottom-left of the Director user interface), you should see messages about the progress

Page 12: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Labs for Reference Data Writing Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 12 of 14

of the job:

3. Wait until the Task area indicates that the Real-Time Checker is running. This may take several minutes.

4. Once the Real-Time Checker is running, click the Name-Gender Map reference data in the Project Browser. Before you ran the job, this reference data was empty, but you

Page 13: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Page 13 of 14 Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Labs for Reference Data Writing

should now find that it is populated:

Use the Gender Check Web Service 1. Navigate to the Enterprise Data Quality Launchpad.

2. From the Web Services drop-down, near the top-right of the screen, select Web Service Tester.

3. If necessary, log in as the dnadmin user (in the training virtual machine, the dnadmin user’s password is also dnadmin).

4. In the Project dropdown, select Learn Name-Gender Map. In the Service dropdown, select Gender Check WS.

5. On the left-side of the screen, enter any First Name that is found in the Name-Gender Map reference data (for example, Alex, Rodney, Ronald, Ruth, Anita, Afzal, Bernatdette, Beverly).

6. Click Send. Provided that the first name can be found in the Name-Gender Map reference data, the Learned Gender will be derived from that reference data, , and the percentage of instances of that name that have the learned gender will be displayed. if you enter a first name that is not present in the reference data, then no gender will be returned, an N will be displayed in the Found in Learned Gender Map? field, and N/A will be displayed in the Percentage field

Page 14: Oracle Enterprise Data Quality Reference Data Writing · Lab 1: Writing to Reference Data Before You Start The lab in this module relies on a specific file being present in the EDQ

Reference Data Writing, Release 12.2.1

Labs for Reference Data Writing Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 14 of 14


Recommended