EDQ Fundamentals for Demoing Lab v9.0

Practices for Lesson 1

Oracle Enterprise Data QualityFundamentals For Demoing LabsVersion 9.0, Issue 1.0

Table of ContentsLabs for Enterprise Data Quality Overview3Lab 1: Download and Install the EDQ Tool4Lab 2: Import Data6Lab 3: Create and Run a Process9Lab 4: Examine the Data12Labs for Profile13Lab 5: Profiling Data Using the Wizard14Labs for Audit16Lab 6: Auditing Valid Genders17Labs for Transform19Lab 7: Check Customers' Ages20Labs for Writing and Exporting Data22Lab 8: Write Customers Who Are Younger than 18 to a File23Labs for Automated Processing: Jobs25Lab 9: Jobs and Scheduling26Labs for Re-Using Your Work: Publishing, Packaging and Copying29Lab 10 (Optional): Publishing a New Processor30Lab 11 (Optional): Working With Packages31Labs for Introduction to the Customer Data Extension Pack34Lab 12: Examine a CDEP Verification and Transformation Processor35Labs for Realtime Processing Via Web Services36Lab 13: Configure and Test a Real Time Process Using a Web Service37Appendix: Extra Labs39Lab A1: Using the Server Console40Lab A2: Externalize Options and Set Overrides in a Run Profile42Lab A3: Create a Run Profile for Test Runs48Lab A4: The Lookup Check Processor49Lab A5: Use Group and Merge to Output Distinct Values50Lab A6: Cleaning Data for Group and Merge51Lab A7: Use a Data Interface in Jobs57Lab A8: Configuring the Dashboard64Lab A9: Users and Security67Lab A10: Audit Case Study: Assess Contact Details69LAB A11: Transformation Case Studies71Lab A12: Installing Oracle Enterprise Data Quality On Windows73Oracle Enterprise Data Quality Demonstration Assets75Version 6.08.201276

Fundamentals and Demo Labs, Version 9.0, Issue 1.0

Fundamentals and Demo Labs, Version 9.0, Issue 1.0

Appendix: Additional LabsCopyright 2013, Oracle and/or its affiliates. All rights reserved.Page 4 of 80

Page 31 of 80Copyright 2013, Oracle and/or its affiliates. All rights reserved.Appendix: Additional Labs

Labs for Enterprise Data Quality Overview

Lab 1: Download and Install the EDQ ToolPrerequisites: An Oracle Technology Network account (free of charge to register for an Oracle Technology Network account: Open your browser and navigate to the Oracle home page: http://www.oracle.com At the top of the Oracle home page, click the link labeled Sign In/Register for Account Follow the instructions to create your Oracle Technology Network accountDownloading Oracle EDQ1. Navigate your browser to the Oracle EDQ download:2. http://www.oracle.com/technetwork/middleware/oedq/downloads/index.html 3. From the Oracle EDQ home page, click the radio button to Accept the License agreement4. Click the button labeled Download File located under the Oracle Enterprise Data Quality Customer Data Services Pack 9.0.3 Zip Distribution text5. Save the .zip file to your hard drive6. Extract (unzip) the EDQ downloaded file to a location of your choosing on your computer7. Install the product by running (double-clicking) the file named dnDirectorSetup.exe from the location where you extracted the EDQ files in step 68. Follow the installation directions, making note of the password you may have assigned to the dnadmin login ID.Test Your Installation - Launch Director1. After downloading and installing EDQ, test the installation by launching the EDQ Director rom the Microsoft Windows start menu, locate the menu item labeled Enterprise Data Quality click that menu item to expand and display all of the installed EDQ tools

2. Click the Director link to launch the EDQ application. Login to the application using the dnadmin username and entering the password (the default username is dnadmin and the default password is dnadmin, unless you changed this during installation). Allow any Java webstart confirmation boxes that may appear so that the client software can be downloaded. If you are asked whether you want to run the application, click Run. If a dialog box warns you that the Windows Firewall has blocked some aspects of the program, click Unblock.

3. After you have logged in, the following application window should be displayed:

Lab 2: Import DataCreate a Project1. Launch the EDQ Director (using the steps from Lab 1, Test Your Installation Launch Director) Note: if you encounter a certificate error, navigate past the error by clicking Continue to this website (it is a local service and does not actually access anything outside of your laptop).2. Click the Director icon on the EDQ Launchpad when presented with the login screen, enter the login and password for the dnadmin user that you created during installation of the tool (default password is welcome unless changed). Note: if you encounter a certificate error, navigate past the error by clicking Yes3. Once the EDQ Director has loaded, on the Project Browser, right-click Projects and select New Project...

4. Give the new project a name of My Customer Data Demo and add a description. Click Next.5. Ensure that All Groups is checked in Project Permissions. This will ensure any user can view and use the project. Click Finish to create the new project.

Add a Data Store1. Expand the newly created project, right-click Data Stores and select New Data Store.2. Select Client and Text files in the two dropdown boxes, but note what other options are available.3. In the Type list, select Delimited text files and click Next.4. On the Data Store Configuration screen, click the to the right of the File text field. The Select dialog opens. Navigate to the US Customers.csv file (if you are using the folder structure from the eRoom or SharePoint, it should be located in the Fundamentals folder). Click Select.5. Select Treat first row as header. Leave the Field separator set to Comma and leave the other fields all set to their defaults.6. Test the connection to the .csv file by clicking the Test button in the lower right-hand corner of the screen. Click Next.7. Give the data store a name of Connection to CSV File and a suitable description. Click Finish to save the data store.

Add a Snapshot1. Right-click Staged Data and select New Snapshot...2. Select your newly created data store to connect to the CSV file. This is where the data for the snapshot will come from.3. Click Next to navigate through the next five screens, accepting all of the defaults.4. On the final screen, give your new snapshot a name of Customers. Click Finish to create the snapshot.Notice that after a short delay of reading the data, the Results Browser is populated with data from the CSV file.Lab 3: Create and Run a ProcessCreate a process1. Right-click on Processes and select New Process...2. Processes start with a Reader and this needs a source for its data. Select the snapshot that you created in the previous exercise and click Next.

3. In Analysis just click Next to continue (we will come back to what this means later).4. Give your process a name of Exploring Customer Data and click Finish to create the process.Notice that the Process Canvas becomes active, the process is created and a Reader processor appears. The green circular arrow icon on any processor means it has yet to be executed.Add a processor1. In the Tool Palette, enter the word quick into the Search box. This will allow you to quickly find the Quickstats Profiler:

2. The Quickstats Profiler belongs to the Profiling family of processors. These are stored together under the Profiling icon:

3. Drag the Quickstats Profiler onto the process canvas, dropping it to the right of the Reader.4. Hover over the output triangle of the Reader processor. The pointer icon turns to a hand and tool-tip information about the processor appears. Click and drag from the output triangle of the Reader processor so that the connector line reaches the input triangle of the Quickstats Profiler. The Quickstats Profiler dialog will appear:

5. Note the message in red informing you that this processor needs at least one attribute to be set as an input.6. Click the Select All icon , shown in the screenshot. This will select all of the input attributes. Click OK to save the processor. Note that the new processor also shows the not yet run green icon.Once a process is run its results will be stored in the Enterprise Data Quality repository and the green not yet run icons will disappear. As subsequent processors are added then only these will need to be run saving time, provided the other processors are left unchanged.

Run the process1. The process now has a Reader and a Quickstats Profiler, both with their green not yet run icons displayed. Click the Run icon in the toolbar to run the process. Note the progress in the Task bar at the bottom-left of the screen as the process runs. When the process has finished the not yet run icons will disappear to show that the processors have data associated with them.

Lab 4: Examine the DataView the data in the Results Browser1. Click the Reader processor to see the input data stored as a snapshot. This will be displayed in the Results Browser.2. Next click the Quickstats Profiler to see the output of the processor.

The Quickstats Profiler produces a summary of the data. For each attribute it displays the number of records without data, the number of distinct values, the number of singletons and the number of duplicate values. Blue text is hyperlinked. You can click blue text to drill-down to see the data underneath.3. Click the hyperlink for the 10 Distinct Values for the Country attribute.4. A second summary appears showing details for those 10 distinct values. Click again on the hyperlink marked Canada to see the actual data relating to this country.5. The 39 rows of data with a value of Canada in the Country field are displayed. The Quickstats Profiler has quickly enabled us to find these anomalies in what seemed to be US-only data.Create and export a Results Book1. Click the Add page in Results Book icon in the Toolbar of the Results Browser.

2. Click the New Results Book... button.3. Name the new results book as Canadian Customers and click Finish.4. Click Next to save a page in this book.5. Click Finish, using the default mapping as it is displayed. Note that the new Results Book appears in the Project Browser. Any number of pages can be added to this book.6. Export the Results Book to MS Excel by clicking the Export to Excel icon in the Results Book toolbar.7. Find the exported file and ensure it loads.8. Go back to the Quickstats profiler and see if you can identify more records that seem to relate to Canadian customers. Add a second page in your results book for these and then work out how to export all of the pages in your Results Book to a single Excel workbook.Labs for Profile

Lab 5: Profiling Data Using the WizardIn this practice you create a new process with profiling and study the results.Profile Data1. In the Project Browser, right-click Processes and select New Process...2. Processes start with a Reader and this needs a source for its data. Select your Customers snapshot and click Next >.3. In Analysis select the Add Profiling checkbox and then click Next > (Do not select the four profilers that are not selected by default).4. Give your process a name, such as Customer Profiling Analysis and click Finish to create the process.5. Run the process. The results should appear in the Results Browser.6. To examine the results, click each profiler to display the results.Quickstats Profiler1. What do you observe about the email field?

2. What do you observe about the country field?

Data Types Profiler3. What do you notice about the format of the ZIP field?

Max/Min Profiler4. What indications do the outliers in the Name attribute give you about the quality of data in this field?

5. What potential data quality issues are indicated by the outliers in the Street field?

Frequency Profiler6. What unique values appear in the gender field?

7. What problems do you notice in the Active attribute?

8. What do all of the start and end dates appear to have in common?

Patterns Profiler9. What does the patterns profiler tell you about the ZIP field?

Record Completeness Profiler10. Look at the record completeness. Is this good or bad?

7. Add an issue to make an 'action-record' for one of the problems you have found during this lab. To do this, right-click on a processor's results in the Results Browser, and choose Create Issue...Labs for Audit

Lab 6: Auditing Valid GendersIn this practice you will use Audit processors to assess whether or not the value in the gender attribute is valid.1. Returning to your Exploring Customer Data process, find the List Check processor and drag it on to the Process Canvas.2. Connect it to the output of the Quickstats profiler. This Configuration dialog will open.3. In the Search box below the Available Attributes panel, type ge and note how the listed attributes are filtered down.4. Highlight the GENDER attribute and click > to select it.5. Click OK to save the processor.6. Double-click on the processors name at the bottom of the icon. Type in a new name of Gender Check.7. Single-click the Quickstats Profiler on the Process Canvas, and, in the Results Browser, drill down on the Distinct Values for the Gender attribute.8. In the Results Browser, select the values M and F. Right-click them and select the option Create Reference Data. The New Reference Data dialog appears.9. Click Next > to accept the Column Name of Gender.10. Select the Gender attribute as the Lookup Column and click Next.11. Click Next > again without selecting a return column.12. Click Next > again without selecting a category.13. Give your Reference Data a Name of Valid Genders and click Finish.14. The Reference Data Editor appears. Check that there are two rows one for the value 'M' and the other for the value 'F' and click OK.15. Double-click the Gender Check processor on the Process Canvas. Move to the Options tab and click . Select the Valid Genders reference data that you created above and click OK and then click OK again.16. Run the process. Check its results, drilling down if necessary.1. How many valid records are there and how many unknown records are there?

2. Drilling down on the records without a valid gender, do you think it would be possible to derive a gender, and if so, how?

17. Change the name of the flag added by the Gender Check processor to ValidGender. Run the process again and check that the flag has been added to your results. To do this, drill-down in the Results Browser until you see the underlying data records and then click the Show or hide flags (if available) icon in the Results Browser's toolbar.

Labs for Transform

Lab 7: Check Customers' AgesIn this lab, we are going to add several processors to check that the customers are at least 18 years old. To do this, we will need to compare the customers' dates of birth with the current date, which we will add to the records. In order to make this comparison, we will need to first convert the Date of Birth attribute, which is held as a string, to the date format.Convert the DoB Attribute to Date Format and Profile it1. Locate the Convert String to Date processor in the Tool Palette and drag it in to your Exploring Customers process (the Convert String to Date processor is in the Transformation family). Wire the Convert String to Date processor up to the output of the Reader. The Convert String to Date dialog opens.2. Select the DoB attribute and then move to the Options tab.3. Click besides List of recognized date formats. The Select Resource dialog opens. De-select the Show project level only checkbox and then select the *US Date Formats reference data. Click OK to close the Select Resource dialog, and click OK again to close out of the Convert String to Date dialog.4. Locate the Date Profiler processor in the Tool Palette and drag it on to the Process Canvas. Wire the output from the Convert String to Date processor's Successful port to the Date Profiler's input port. Select the DoB.StringToDate attribute (this is the new attribute that will be created by the Convert String to Date processor) and click OK to close the dialog.5. Run the process and study the Date Profiler's results in the Results Browser.Does the Date Profiler reveal any interesting patterns or anomalies in the Date of Birth?

Add the Current Date and Calculate Your Customers' Ages1. Locate the Add Current Date processor in the Tool Palette and drag it in to your Exploring Customers process (it is in the Transformation family). Wire the Add Current Date processor up to the output of the Date Profiler.When the process is run, the Add Current Date processor will timestamp each of the records with the current date and time. This information will be held in a new attribute called ProcessingDate.2. Locate the Date Difference processor in the Tool Palette and drag it on to the Process Canvas (it is in the Transformation family). Wire it up to the output from the Add Current Date processor. The Date Difference dialog opens.3. Select the ProcessingDate attribute in the Subtract From field, and select the DoB.StringToDate attribute in the Subtracted field.4. Move to the Options tab and select the Yes radio button for Whole Years? Leave all other radio buttons and fields set to their default values and click OK to close the dialog.When the process is run, the difference between each customer's date of birth and the current date will be calculated and rounded down to a year. The new field that is created (called 'WholeYears) will hold the customers' age in years at the point at which the process was run.Check to See Whether Your Customers are 18 or Over1. Locate the Value Check processor in the Tool Palette and drag it in to your Exploring Customers process (it is in the Audit family). Wire the processor up to the output of the Date Difference processor. The Value Check dialog opens.2. Select the WholeYears attribute as the Field for validation.3. Move to the Options tab and enter a Value to compare records against of 18. Select a Comparison operator of Is greater than or equal to. Leave all other fields set to their default values and click OK to close the dialog.When the process is run, the customers who are younger than 18 will fail this audit check. Those who are 18 or older will pass.4. Run the process and study the Value Check's results in the Results Browser.How many customers are younger than 18?

How old is the youngest customer?

Labs for Writing and Exporting Data

Lab 8: Write Customers Who Are Younger than 18 to a FileIn the previous module, you identified customers who were younger than eighteen. You are now going to stage this data and export it to a text file.Write to staged data1. Navigate to your Exploring Customers process. Add a Writer processor and connect it to the Value Check processor's Fail port.2. Configure the Writer processor by clicking to select all of the available attributes.3. Use the button to remove the attribute DoB attribute (we don't want this, as we have added the DoB.StringToDate attribute, which is correctly formatted). You may also want to remove the DayOfWeek, DayOfMonth, DayOfYear, Month and Year attributes, which were added by the Date Profiler.Ensure that the DoB.StringToDate attribute is still listed.4. Click and drag the WholeYears attribute to the top of the list.5. At the top-right of the configuration dialog leave the Type as Staged Data and click + to create a new Staged Data object.6. Change the name of the WholeYears attribute to Age. Leave the rest of this dialog as it is and click Next > to continue.7. Give the new Staged Data a name of Underage Customers and click Finish and then OK.8. Run the process and note the position of the Age attribute in the Results Browser.9. The data has been written out to temporary tables in the Enterprise Data Quality repository. It will persist until purged, refreshed or deleted.Note that an Underage Customers Staged Data node appears in the Project Browser.Create a new data store1. Right-click Data Stores and select New Data Store...2. Ensure the Data is accessed from dropdown list is set to Client.3. Ensure the Category is set to Text files. Choose Delimited text files and click Next >.4. Click to open the Select dialog. Navigate to your laptop's desktop so that the underage customers file will be saved there. Enter a name of Underage Customers.csv in the File Name field. Click Select.Underage Customers.csv will be the name of the text file into which the records will be output.5. Select the Treat first row as header check-box.6. Ensure the Field Separator of Comma is selected and then click Next >.7. Give your new Data Store a name of Write Underage Customers to File and click Finish.Export a File1. Right-click Exports and choose New Export...2. Select the Underage Customers staged data that you created earlier click Next.3. Select your Write Underage Customers to File Data Store and click Next.4. Enter a Name of Export Underage Customers. Select the Run now? checkbox, and click Finish to save your export configuration and run the export at the same time.5. Locate the Underage Customers.txt csv on your desktop and check through its contents.Labs for Automated Processing: Jobs

Lab 9: Jobs and SchedulingSTUDENTS NOTE:This exercise references the Cloud/Virtual based Customers.csv file. These exercises do not use cloud/server-side input or output files, so for the purposes of learning how to schedule jobs, the training content has been modified to provide additional directions that will emulate the cloud/server-side input and output files.You are now going to create a job that will import a snapshot of customer data, identify your underage customers, and then export them in a file in other words, it will automate the tasks you have configured in the previous labs. results. Snapshots and exports can only be included in jobs if the data stores (connectors) that they are associated with access data on the server side, rather than the client side. The Data Stores you created earlier were client side. So before we create a job, we will first create new Data Stores for import and export respectively. These will both be server side. Fortunately, the US Customers.csv file that holds the demo data is already in the virtual machine's landing area, so we will not have to move it there. We will, however, also need to change one of your snapshots so that it uses the server side data store for import.Create a new Data Store for Import1. Within your project, right-click Data Stores and select New Data Store.2. Select Server and Text Files respectively in the two dropdowns.3. Select Server-based delimited Text Files as the Type and click Next >:4. Additional step to emulate server-side files: copy the USCustomers.csv file from the SharePoint site to your local computer:C:\Program Files\Datanomic\dnDirector\config\landingarea5. Enter US Customers.csv in the File in server work area field and also select the Treat first row as header check box. Test the connection to the file by clicking Test. If you can connect successfully, click Next >.6. Give the Data Store a Name of Server Side Connection to Customers File.7. Click Finish to save the data store.Alter the Configuration of Your Snapshot1. If necessary, expand the Staged Data node in the Project Browser so that you can see your snapshot of the customers table (it should be called Customers). Right-click your snapshot and select Edit2. In the Setup Snapshot dialog, select the Server Side Connection to Customers File Data Store that you created above. Click Next> five times and then click Finish. Your snapshot will now import data through your new server-side Data Store.Create a New Data Store for Export1. Right-click Data Stores and select New Data Store.2. Select Server in the Data is accessed from dropdown and Text Files in the Category dropdown.3. In the Type list select Server-based delimited Text files and click Next >.4. In the File in server work area field, enter a name of Underage Customers.csv and note that the landing area location is displayed directly beneath this field.5. Select the Treat first row as header checkbox.6. Ensure that Comma is selected as the Field separator.7. Leave all of the other fields set to their default values and click Next >.8. Give the data store a name of Write Underage Customers to Landing Area and click Finish to save the data store.Create a New Export1. Right-click Exports and select New Export...2. Select your Underage Customers staged data and click Next >.3. In Data Stores, select the Write Underage Customers to Landing Area Data Store that you created above and click Next >.4. Give your new export a name of Server Side Export of Underage Customers and click Finish to save.Create a New Job1. Right-click the jobs node and select New Job... The New Job dialog will open.2. Enter a Name of Automated Underage Customer Detection and click Finish. The dialog will close.3. At the top of the process canvas double click the words New Phrase. The Phase Configuration dialog will open.4. Change the Phase Name to Import Customer Data and click OK.5. In the Tool Palette, expand the Snapshot node. Drag the Customers snapshot onto the Process Canvas underneath the phase you have just renamed.6. Click to create a new phase.7. Click at the top of the process canvas and in the Phase Configuration dialog change the Phase Name to Identify Underage Customers. Ensure the EXECUTE_ON_SUCCESS condition remains selected and click OK.Selecting EXECUTE_ON_SUCCESS means that the phase will only be executed if the previous phase completed successfully.8. In the Tool Palette, expand the Process node. Drag the Exploring Customer Data process onto the Process Canvas underneath the phase you have just renamed.9. Click to create a new phase.10. Click at the top of the process canvas and in the Phase Configuration dialog change the Phase Name to Export Underage Customers. Ensure the EXECUTE_ON_SUCCESS condition remains selected and click OK.11. In the Tool Palette, expand the Export node. Drag the Server Side Export of Underage Customers export onto the Process Canvas underneath the phase you have just renamed.12. Click to save your new job.Run Your New Job1. In Director, ensure that you can see the Tasks area. This is directly below the Project Browser. Resize the Tasks area if necessary (if you cant see the Tasks area at all, then ensure it is selected in the View menu). During the next step, keep watching the Tasks area in order to see your job running.2. In the Project Browser, right-click your Automated Underage Customer Detection job and select run. You should briefly see the job running in the Tasks area.3. Check that your job has created the Underage Customers.csv file in the landing area of the virtual machine. Open the file and inspect its contents. The location of the landing area on your virtual machine is:C:\Program Files\Datanomic\dnDirector\config\landingarea

Inspect the Event Log1. In Director, click the Event Log icon (). The top Events (rows) in the log should relate to the job that you have just run. You should be able to see events for the start and the end of the job. The snapshot, process and export each have an event and there are also events for System Tasks, such as indexing (look at the entries in the Task Name and Task Type columns). Double click the row for the job end and a report should open in your internet browser. Examine the information that is presented, and note that this includes the number of records read by the reader and written by the writer.2. Return to the Event Log.Note that if your job had experienced errors, the status of one or more Events would be ERROR and those events would be displayed with red text. You would double click on these events in order to display detail that would help you to fix the error. Also note that you can filter what is displayed in the Event Log to make it easier to find particular events. You can filter by variables including Event Type, Status (for example, ERROR), Task Type, Project Name, Job Name and Start Time.3. Close the Event Log and return to Director.Schedule Your Job to Run Automatically1. In the Project Browser, right-click your Automated Underage Customer Detection job and select Schedule Job2. In the Schedule dialog, schedule your job to run daily at 23:59 and then click OK to close the dialog.3. Click to open the Schedules dialog. You should see that your new job is scheduled to run on a daily basis. Click Close to exit from the dialog.

Labs for Re-Using Your Work: Publishing, Packaging and Copying

Lab 10 (Optional): Publishing a New ProcessorIn this practice you will make and publish a processor.Make Part of a Process into a Processor1. Open your Exploring Customers process on the Process Canvas.2. Select the Add Current Date, Date Difference and Value Check processors on the Process Canvas. Right-click them and select Make Processor. A New Processor tab opens and the right-hand side of the final processor should be shaded in a green color. This indicates that your composite processors output will come from this processor.3. Click to open the Processor Setup dialog and navigate to the Inputs tab. The External Label is the field name that the users will see when they connect your processor to a reader or another processor. By default the external label is set to the attribute name from the snapshot used to create the processor. Change the External Label to Date of Birth, and then click OK twice to close the dialogs.4. In the Process Canvas, navigate to the Process sub-tab. You should now see your new processor attached to the Date Profiler.5. Right-click your new processor on the canvas and select Configure.6. In the New Processor dialog, navigate to the Icon and Family tab, and note that you can select an icon of your choosing for the new processor. You can also allocate your new processor to any of the existing processor families (by default it belongs to the Published Processors family).7. Click OK to close the New Processor dialog.Name and Publish Your Processor1. Right-click your new processor on the Process Canvas and select Publish Processor. In the Enter Name dialog, enter a name of My Age Checker and click OK.2. In the Tool Palette, click to view the Published Processors family. Your new processor should be displayed. You can now drag it into other processes and use it with different data sets.Lab 11 (Optional): Working With PackagesIn this practice you will create and import a package.Create a Package1. In the Project Browser, navigate to your My Customer Data Demo project. Right-click it and select Package. If any dialog boxes appear inviting you to save changes, click Yes.2. Save your package to your machines desktop.3. Verify that your package is present on your machines desktop (it should be called My Customer Data Demo.dxi). Examine its file size. Since it only contains configuration and reference data, and not working data, it should be of a small, portable size.Import a Package1. Right-click at the bottom of the Project Browser and select Open Package File or alternatively follow the menu path File >> Open Package File2. Navigate to the file US SSN Check processor.dxi which you will find in C:\share\edq_training_assets\Data Files\fundamentals. Select the file and click Open. A folder named US SSN Check processor.dxi should appear at the bottom of the Project Browser.3. Drag the Projects node from beneath the US SSN Check processor.dxi and drop it on top of the Projects node directly below localhost (localhost is your server name).

4. A US SSN Check processor project should now have appeared in the Project Browser below the Transliteration Demo project.

5. Expand the US SSN Check processor node within the Project Browser, and within the US SSN Check processor, expand the Processes node. Double click the US SSN Check process to open it on the Process Canvas.

6. Right-click the US SSN Check processor and select Open.

7. The packaged .dxi file that you imported contains a published processor to check for valid United States Social Security Numbers.Labs for Introduction to the Customer Data Extension Pack

Lab 12: Examine a CDEP Verification and Transformation ProcessorIn this lab we will take a look at an example of a CDEP verification and transformation processor. Please note that we cannot cover all of the CDEP processors in the training course, so it is important for you to use the online help in order to find out about the processors that we don not cover in this course.1. Create a new project called CDEP and within it create a new data store to connect to the file customer_file2.txt which you will find in C:\share\edq_training_assets\Data Files\fundamentals. customer_file2.txt is a tab-delimited text file with a header.2. Snapshot the data in the customer_file2.txt file.3. Create a new process within you CDEP project. Choose the Snapshot you created in the previous step as your data source.4. In the Tool Palette, click to see the CDEP processors. Drag the Country from City processor on to the canvas and hook it up to the reader and select City as the field to match.5. Run your process and study the results.How many records is the processor able to enhance, and what value does it place in the Enhanced Result attribute?

Double click the Country from City processor to open it and navigate to the options tab. Click to view the supplied reference data. Close the reference data editor and note that it is possible to associate the processor with other reference data.

Labs for Realtime Processing Via Web ServicesLab 13: Configure and Test a Real Time Process Using a Web ServiceIn this lab, you will create a Web Service that will input three values to a realtime process. These values will be title, first name and stated gender. Your realtime process will use the Add Gender processor from the Customer Data Extension Pack to derive a gender from the title and first name. It will then return the derived gender to the Web Service along with the three original values. Once your configuration is complete, you will use Enterprise Data Qualitys Web Service Tester to test your Web Service. You will then view the Web Services WSDL file (description) in a web browser. The WSDL (which stands for Web Service Description Language) provides developers with all of the information they would need to build interfaces to your Web Service.Create a Web Service1. Right-click the Web Services node within the Project Browser and select New Web Service2. In the Web Service Inputs dialog, click and create a string attribute called Title.3. Click and create a string attribute called FirstName.4. Click again and create a string attribute called StatedGender.5. Click Next >. In the Web Service Outputs dialog, click Use Inputs. This should automatically create three output attributes with the same names as your input attributes. Click and add a fourth string output attribute called DerivedGender.6. Click Next > and name your Web Service Gender Check. Then click Finish.Create a Realtime Process1. Right-click the Processes node within the Project Browser and select New Process2. In the Data Source dialog select Realtime from the dropdown. Choose your Gender Check Web Service and then click Next >.3. Click Next > through every screen in the wizard, accepting the defaults.4. In the Tool Palette, click to select the Customer Data (Customer Data Extension Pack CDEP) family.5. Drag the Add Gender processor onto the canvas and connect it to the reader. Associate the Title, FirstName and StatedGender attributes with the appropriate fields and then click OK.6. Drag a Writer on to the Process Canvas and connect it to the Add Gender processor.7. Configure the writer to so that the Title, FirstName, StatedGender and DerivedGender attributes are the Selected attributes for writing. In the Type dropdown, select Realtime and in the Name dropdown, select Gender Check.8. Click Map By Type. The selected outputs for writing should be automatically mapped to the appropriate outputs of the Gender Check Web Service. If this is not the case, manually connect them.9. Click OK to close the Writer Configuration dialog and run your process. It should run continually as a daemon.Test Your Web Service1. Navigate to the Enterprise Data Quality Launchpad.2. If the Web Service Tester does not appear on your Launchpad, click Server Configuration, input your username and password, and then click Applications. Select the checkbox to the left of Web Service Tester. Click Save and then click Home.3. From the Enterprise Data Quality Launchpad, launch the Web Service Tester and log in as dnadmin (the password is welcome).4. In the Project dropdown, select the project you have been working with. Select your Gender Check Web Service in the Web Service dropdown.5. Click Get WSDL (WSDL stands for Web Service Description Language).6. On the left-hand side of the screen, under Web Service Inputs, you should be able to input values for Title, FirstName and StatedGender. Try inputting values such as Mr, John, F.7. Click Send. The Web Service should deliver the values you input to your process and return the results to the right-hand side of the web service tester screen, adding a Derived Gender to the input attributes. Check that the process returns the results you would expect.View Your Web Services WSDL File1. Return to the Director user interface. In the Project Browser, right-click the name of your Web Service and select Copy WSDL URL to clipboard.2. Open your Web Browser. Paste the contents of the clipboard into the address bar and navigate to your Web Services WSDL. The WSDL should display in your web browser. It describes your Web Service, including its location (see the tag that begins > Current Tasks Popup. A popup window should open.2. Resize the main Server Console window and the popup window so that you can see each side-by-side in your display.In a moment we will run a job. You should briefly see it running in the popup window.Run a Job1. Within the Jobs list, expand the Data Cleanup and Profile project. Right-click the Cleanup and Profile job and select Run...2. Leave the Run Profile dropdown empty, and type in a Run Label of First Run. Click OK to run the job.View Results3. Click to navigate to the Server Consoles Results window. Your job should be displayed in the Job History panel at the top of the screen (note that it is listed under the run label of First Run and that the project name, job name and end time are also displayed). Click your job. In the Results panel at the bottom of the screen you should see the results of a Quick Stats Profiler.View the Event Log1. Click to navigate to the Server Consoles Event Log window.2. In the Filter, click to the left of Job Details to expand the Filtering options for jobs.3. In the Job section, locate and select your Cleanup and Profile job.4. Click . The Event Log should now be filtered so that it displays only the log entries for a single job.5. In the Event Log, double click the row with an Event Type of Task and a Sub Type of End. The Task Log should open in a new window.Lab A2: Externalize Options and Set Overrides in a Run ProfileIn this lab, you will be working with customer data from two different countries the United States and the United Kingdom. Data from the two countries is held in different files. Your aim is to clean and then pattern check the telephone numbers in the customer data. Since telephone number formats in the US and the UK are different, you will need to check against different patterns in each case. You will also need to output valid telephone numbers to different files one file for US customers and another file for UK customers. You will set up your job to check US customers telephone numbers by default, but will then create a Run Profile to override the default settings so that you can use the same job to audit UK customers telephone numbers.You will create a data store, snapshot some data and then create a process and an export. You will externalize some of the options and then create a job to run your snapshot, process and export. Finally, you will create a Run Profile to override your externalized options and run the job using the Run Profile in the Server Console user interface.Create and Test a Process1. Using Windows Explorer or a similar application, navigate to C:\share\edq_training_assets\Data Files\fundamentals. Locate the following files and copy them to your EDQ landing area: uk_customers.txt us_customers.txtThe location of the landing area on your virtual machine is: /opt/Oracle/Middleware/edq/config.2. Open both files in a text editor. Note that the format and content of the two files is similar, except that one file contains customers from the United Kingdom, whilst the other contains customers from the U.S.A. Also note the format of phone numbers you see in the files.3. Within an EDQ project, create a Data Store for a delimited text file. Ensure that you select that Data is accessed from the server. You are going to configure the data store to access the us_customers.txt file, so:1) Enter the filename into the File in server work area field.2) Select the Treat first row as header checkbox.3) Select a Field separator of Pipe symbol (|).4. Name the Data Store, FileConnection and test that it works.5. Snapshot the data from the us_customers.txt file. Accept all of the snapshot defaults except the name. Name your snapshot IncomingCustomerData.6. Create a process using the IncomingCustomerData snapshot. Name your process CheckPhoneFormat.7. In the C:\share\edq_training_assets\Data Files\fundamentals folder you will find a package file called phone_patterns.dxi. Right-click any whitespace within the Director Project Browser and select Open Package File Navigate to the phone_patterns.dxi file and open it. In the Director Project Browser, locate the phone_patterns.dxi folder, expand its Projects node and its Phone Patterns node and copy the reference data within to your projects local reference data.8. Within your project, double click each set of reference data in turn. Note that each reference data table contains one row only a single valid phone number format for the US and the UK respectively. The two patterns are different.9. Within your EDQ process, connect up processors in series to trim the whitespace from the left and right of the Phone attribute and to denoise it.Ensure that in the Trim Whitespace processor you select the Trim option Left and Right.10. At the end of the series, connect up the Pattern Check processor and select Phone as the field for validation. Navigate to the options tab and select USPhonePatterns as the reference data for Valid Patterns (you may need to de-select the Filter by category checkbox in order to see the reference data). In the Categorize unmatched as dropdown, select Invalid.11. Connect up a Writer to the Valid output of the Pattern Check processor. Select all attributes for writing and click to create some new staged data. Call your new staged data ValidPhoneNumbers.12. Connect up a Writer to the Invalid output of the Pattern Check processor. Select all attributes for writing and click to create some new staged data. Call your new staged data InvalidPhoneNumbers.13. Run the process from Director, ensure that it works and study the results.How many records contain valid phone numbers?

How many records contain invalid phone numbers?

14. In the Process Canvas, single click the Pattern Check Processor. Then, in the results browser click to stage the top level results summary from the Pattern Check Processor. In the Publish Results dialog, click to create new staged data and name your new staged data PhoneResults. Note that the new staged data table appears under the Staged Data node in the Project Browser.Create an Export1. Create a new Data Store to output data to a comma-delimited text file called us_valid_phone_numbers.csv. Ensure that the data store accesses data from the server. Call your data store PhoneNumberOutputConnection.2. In the Project Browser, right-click Exports and select New Export Select your ValidPhoneNumbers staged data and click Next >. Select your PhoneNumberOutputConnection Data Store and click Next > again. Name your export Valid Phone Export and click Finish.Select Your Externalization Options1. In the Project Browser, right-click your IncomingCustomerData snapshot and select Externalize2. In the Snapshot Externalization dialog, select the File in server work area checkbox. This will allow you to switch to a different source data file in your run profile. Note the other available options, and click OK to close the dialog.3. In the Process Canvas, double click the Pattern Check processor and navigate to the Options tab. Click to move to the Mapping page. Note that the Title, Given Name and Email Address attributes from the snapshot have been automatically mapped to your Data Interface. This is because the attribute names in the snapshot are the same as those you defined in the Data Interface. The Last Name attribute in the snapshot, however, has not been mapped. To map Last Name, manually drag a line from the symbol to the right of Last Name in the Home Insurance Customers snapshot to the symbol to the left of Family Name in the Data Interface. Each of the four attributes in your Data Interface should now be mapped to an attribute in your Home Insurance Customers snapshot. 4. Click Next > and give your Data Interface Mapping a Name of Mapping to Home Insurance Customers. Then click Finish.5. Click again to open the New Data Interface Mappings wizard. The second mapping we will create is to the snapshot of your second customer data file. Leave the Type set to Staged data, but this time select Mortgage Customers [Data in] as the Name.6. Click Next > to move to the Mappings page. Note that for this mapping, only the Title attribute from the snapshot has been automatically mapped to your Data Interface. (the other attribute names in the snapshot and the Data Interface differ.) Manually map the appropriate attributes in the Mortgage Customers snapshot to the remaining three fields in the Data Interface by dragging a line from the symbol to the right of the Mortgage Customers snapshot attribute to the symbol to the left of Data Interface attribute. 7. Once you have mapped all of the Data Interface attributes to the relevant Mortgage Customers attributes, Click Next >. Give your Data Interface Mapping a Name of Mapping to Mortgage Customers. Then click Finish.8. In the Data Interface Mappings dialog, you should see the two mappings you have created. Click OK to close the dialog.Note that it is possible to create Data Interfaces and mappings based upon existing Staged Data. To do this, you right-click the Staged Data in the Project Browser and select Create Data Interface Mapping Then, in the dialog that opens, click New Data Interface and simply follow the wizardCreate Staged Data TargetsIn our exercise, we are going to assume that we need two different types of output from our process, one containing all four of our Data Interfaces attributes, and the other only requiring Family Name and Email Address. We will create a Staged Data table for each of these different outputs.1. In the Project Browser, right-click Staged Data and select New Staged Data to open the New Staged Data wizard.2. Click and create four string attributes, named Title, Given Name, Family Name and Email Address respectively.3. Click Next > and give your staged data a name of Processed Customer Data Home Insurance. Then click Finish.4. Repeat steps 1 to 3 to create a second Staged Data table, this time naming it Processed Customer Data Mortgage. In your second staged data table, include only two attributes: Given name and Family name.Create Output Mappings for Your Data Interface1. Since in our case the attributes we want to output are the same as (or a subset of) the attributes we wanted to input, we can use the same Data Interface for both reading and writing data, albeit with different mappings. So, right-click the Customer Data Interface in the Project Browser and select Mappings The Data Interface Mappings wizard will open.2. Click to open the New Data Interface Mappings wizard. We will now create a mapping to our first set of staged data for output. Leave the Type set to Staged data. Select a Name of Processed Customer Data Home Insurance.3. Click Next > to proceed to the Mappings page. All of the attributes should have been automatically mapped. If this is not the case, manually map any remaining attributes in the Data Interface to appropriate attributes in your Processed Customer Data. 4. Once all of the attributes are mapped, click Next >. Give your Data Interface Mapping a Name of Output Mapping Home Insurance. Then click Finish.5. Click to open the New Data Interface Mappings wizard. You will now create a mapping to your second set of staged data for output. Leave the Type set to Staged data. Select a Name of Processed Customer Data Mortgage.6. Click Next > to proceed to the Mappings page. Note that the Given Name and Family Name attributes should have been automatically mapped (if this is not the case, map them manually).7. Click Next >. Give your Data Interface Mapping a Name of Output Mapping Mortgage. Then click Finish.8. In the Data Interface Mappings dialog, you should now see the four mappings you have created. Click OK to close the dialog.Create a Process to Clean Customer DataData Interfaces are most valuable when you have developed a complex process that you want to re-use with multiple data sources or targets in which the format of the data differs. However, since speed is of the essence in training, we are going to create a simple process that can be used against different sources and targets, so that you can quickly see how Data Interfaces work.1. Right-click the Processes node within the Project Browser and select New Process2. In the Data Source dialog select Data Interfaces from the dropdown. Choose your Customer Data Interface and then click Next >.3. Click Next > through every screen in the wizard, accepting the defaults.4. Connect the following processors in series, selecting the Given Name and Family Name attributes for each:1) Trim Whitespace.2) Denoise.3) Proper Case.5. Drag a Writer on to the Process Canvas and connect it to the final processor in the series.6. Configure the writer to so that all four attributes are selected for writing. In the Type dropdown, select Data Interfaces and in the Name dropdown, select Customer Data Interface.7. Click Map By Name. The selected outputs for writing should be automatically mapped to the appropriate inputs of the Customer Data Interface. If this is not the case, manually connect them.8. Click OK to close the Writer Configuration dialog and then run the process.

Does your process run successfully? If not, what error message is returned?

Select Mappings for Your Process and Re-Run it1. At the top of the Process Canvas, click to open the Run Preferences dialog.2. In the right-hand side of the dialog, click the Customer Data Interface Reader to select it (). In the Selection drop-down, choose Mapping to Home Insurance Customers.3. Next, click the Customer Data Interface Writer to select it (). The writer can write out to multiple Staged Data tables simultaneously. For this exercise, however, just select Output Mapping Home Insurance.4. Click OK to close the Run Preferences dialog and run the process again.Study the results of your process.How many records were input by the reader?

In the Project Browser, select the Processed Customer Data Home Insurance staged data. How many records have been output and how many attributes are there?

Number of records:Number of attributes:

Change Your Readers Mapping and Re-Run the Process1. At the top of the Process Canvas, click to open the Run Preferences dialog.2. In the right-hand side of the dialog, click the Customer Data Interface Reader to select it (). In the Selection drop-down, choose Mapping to Mortgage Customers.3. Next, click the Customer Data Interface Writer to select it (). Ensure that only Output Mapping Mortgage is selected.4. Click OK to close the Run Preferences dialog and run the process again.Study the results of your process.How many records were input by the reader?

In the Project Browser, select the Processed Customer Data Mortgage staged data. How many records have been output and how many attributes are there?

Processed Customer Data MortgageNumber of records:Number of attributes:

Create a Job to Clean Home Insurance Customer Data1. At the top of the Process Canvas, click to open the Run Preferences dialog.2. Click Save as Job. Enter a Name of Customer Data Cleaner - Home Insurance and click OK. Your new job should open in the Process Canvas.3. At the top of the Process Canvas, click to add a new Phase to your job. Double click the New Phase and change its Phase Name to Import. Click OK and then click to move the Import phase to the top (start) of your job.4. In the Tool Palette, expand the Snapshot node. You should see both of the snapshots you created (Home Insurance Customers and Mortgage Customers ). Drag the Home Insurance Customers Snapshot into your Import phase.5. Within the phase of your job that contains the process, (called Customer Data Interface if you accepted all default values), double click the Customer Data Interface Reader to select it (). In the Selection drop-down, ensure that Mapping to Home Insurance Customers is selected.6. Click OK to close the dialog.7. Double click the Customer Data Interface Writer to select it (). In the Selection drop-down, ensure that only Output Mapping - Home Insurance is selected.8. Click OK to close the dialog and then run the job.9. Click to open the Event Log and view the results of your job. Double click the Job End event. A report should open in your web browser.How many records were imported by the snapshot?

How many records were read into and written out of the process?

Create a Job to Clean Mortgage Customer Data1. Create a second job to: Read data from the Mortgage Customers snapshot, Clean the data using the process you have already created and Stage the cleaned data in the Processed Customer Data Mortgage Staged Data.2. Run your second job.3. Click to open the Event Log and then click to refresh it. Double click the Job End event for the latest job. A report should open in your web browser.How many records were imported by the snapshot?

How many records were read into and written out of the process?

Lab A8: Configuring the DashboardSince we communicate regularly with our customers, we are going to configure the dashboard to display the quality of the data we hold in the email, title and name fields.Create a New Process1. Within your Training Project, create a New Process and call it Quality of Contact Data, associating it with your Customers snapshot.2. Locate, drag on and connect up processors to check for:a) Data in the TITLE attribute.b) Invalid (Noise) characters in the NAME attribute.c) Valid email addresses.Configure Your Process to Publish to the Dashboard1. In each of the processors in turn, right-click and select configure, navigate to the Dashboard tab and select the Publish to Dashboard check-box and enter a meaningful Rule Name (e.g. Data in Title Field).2. At the top of the Process Canvas, click to open the Run Preferences dialog. Select the Publish to Dashboard? check box and then click OK to close the dialog.3. Run your process.Add a Summary to the Dashboard1. Navigate to the Oracle Enterprise Data Quality Launchpad in your web browser and launch the Dashboard, logging in as the dnadmin user (bear in mind that you may have changed your password to dnadmin1 at the beginning of the briefing).2. Click Customize. From the Add New drop-down select your project / process and then click Add. (for example select Training Project / Quality of Contact Data). A summary will be added for your process. You should see that there is one rule for each of the audit processors you added above.3. Click the normal link to return to browsing mode and then click the Name of the summary to drill down to see results for your rules.What is the status of each of your three rules (red, amber or green?)

Adjusting Thresholds1. Click the Dashboards Administration link. The Dashboard Administration interface will download via Java Web Start.2. Click Default Thresholds.3. In the Red section of the Rules tab change the percentage threshold for Alerts to 5% and click Save.4. Return to the Dashboard and refresh your web browser.What effect does this have on the status of your rules?

5. Return to Dashboard Administration and navigate to the Dashboard view.How would you set a custom threshold to apply to a single summary or rule?

6. In the Dashboard (not Dashboard Administration!), try clicking besides a rule. This will take you to a history graph that will show you the change in your rules results over time. Since the results have not yet changed, at the moment the graph will report that no history is available. If you were to change the source data and run your snapshot and process again, then the history graph would display line or bar graphs showing movements in your results over time.Adding an Index1. In Dashboard Administration, click New Index and in the Add Index dialog enter a name. The new index will be added to the list of Dashboard Elements.2. Under Audits and Indexes, expand Audits and then expand the node for your summary. Drag each of the rules from Audits & Indexes and drop them on to your index under Dashboard Elements.3. Under Dashboard Elements, right-click your index and select Custom Weightings. In the Index Weightings dialog, change the Weight of the one of your rules rule to 3 and leave the weight of the other rules set to 1. Click OK. This will give a higher weighting to one factor than the others.4. Finally, drag your index from Dashboard Elements and drop it over the Administrators User Group. This will mean that members of the group, including the dnadmin user, will be able to view the new index.Note that summaries are automatically added to the Administrators group. For members of other groups to see them, you will need to drag them from the Dashboard Elements area and drop them over the appropriate group in the User Groups area.5. Click Save and close Dashboard Administration.6. Return to the Dashboard and, in the breadcrumb near the top of the screen, click the My Dashboard link.7. Click Customize (you may need to click your browsers back button first). From the Add New drop-down select your index and then click Add. (if you cant see your index, log out of the Dashboard and then log back in again). Your new index will be displayed on the dashboard.8. Click the normal link to return to browsing mode and then click the Name of the index to drill down to see results for your three rules.In the exercise above the summary that you displayed on the dashboard was automatically generated when you published the results of your process and processors to the dashboard. However, it is also possible to create new summaries manually in Dashboard Administration. Rules relating to any audit or parse process within any project can be added to a summary provided that the relevant process and processor are both configured to publish their results to the dashboard.Lab A9: Users and SecurityCreate a Group1. Open the Enterprise Data Quality Launchpad in your web browser and click User Configuration, logging in as the dnadmin user (bear in mind that you may have changed your password to dnadmin1 at the beginning of the briefing). Note that you have done this. In the Enterprise Data Quality Users page you should see a single user, dnadmin, which is assigned to the Administrators group.2. Click Groups. In the Enterprise Data Quality Groups page you should see a list of all of the default groups with their associated permissions. Scroll to the bottom of the page and click Add Group.3. In the Enterprise Data Quality Create a new Group page name your group Limited Access.4. Assign your group the following roles:a) Data: View Reference Datab) Director: Process.Modifyc) Director: ReferenceData.Addd) Director: ReferenceData.Modify5. Click Apply. Your new group should appear in the Enterprise Data Quality Group Configuration page.Create a User1. Click Users and Groups in the breadcrumb at the top of the page. Then click Add User.2. In the Enterprise Data Quality Create a new User page create a user of your choice. Assign it to your new Limited Access group and fill in the following fields:a) Username.b) Password.c) Retype Password.d) Full NameWrite down the credentials you use, as you will need them shortly.3. Select your Limited Access group and assign it to your new user.4. Click Apply. Your new user should appear in the Enterprise Data Quality User Configuration page.5. Assign Your Group to an Application6. Click Home. Then click Server Configuration and Applications. In the Enterprise Data Quality Applications page you should see a list of applications (those with ticks can be launched from the Launchpad home page).7. Click Groups besides Director. In the Enterprise Data Quality Application Groups page assign your new Limited Access group to Director. Then click Save.8. Click Home to return to the Launchpad.9. Assign Your Group to a Project10. Unless you are already logged into Director as the dnadmin user, launch the application and log into it as dnadmin.11. In the Project Browser, right-click your Training Project and select Properties.12. In the Training Project Properties dialog, navigate to the Security tab and click Configure.13. Deselect the All Groups check-box. Choose your Limited Access from the list of Available Groups and click > to make it a selected group. Click OK and then Close.14. Close down Director.Test Your Configuration1. Re-launch Director logging in as the new user you created above.When logged on as your new user, can you:a) View global, system-level reference data.

b) Create new global, system-level reference data?

c) Modify existing global, system-level reference data?

d) Create new reference data within your project?

Are your answers above in line with your expectations given the configuration of the Limited Access group?

Lab A10: Audit Case Study: Assess Contact DetailsIn this case study you will use a range of Audit processors to assess the quality of customer contact details in your data.Note that this case study is written as if it were a real request from a client written in an email. Please read through all of it carefully before you start. You may find it useful to draw out your process roughly on paper before you start configuring it in Oracle Enterprise Data Quality.Hi ____________________ (insert your name here :-),We need to issue a business-critical communication to all of our customers in order to apologize for a recent service outage and to offer each customer a 5% reduction in their next bill. In order to do this, we want to use each customers preferred contact method. This will always be one of:TelephoneEmailLetter(Unfortunately, we dont yet know which communication preference each customer has chosen).In order to assess our readiness for this communication, could you please provide the following metrics:

1) Number of Customers with a valid physical mailing address.

2) Number of customers with all communication methods valid.

3) Number of customers without a valid physical mailing address, but who do have a valid email address or telephone number.

Please bear in mind the following key points:Our working definition of a valid mailing address is: ADDRESS1, ADDRESS2 and ADDRESS3 are all populated, and the POSTCODE field is populated with an entry in a valid format.Our definition of a valid email address is: the EMAIL field is populated with a value in a valid format that is no more than 45 characters long (this is because our bulk mailing system only accepts email addresses of up to 45 characters).Our definition of a valid telephone number is where the AREA_CODE attribute is populated with exactly four numeric characters and the TEL_NO attribute is populated with exactly six numeric characters.Use Oracle Enterprise Data Qualitys Audit, and where necessary transformation, processors to find the metrics that your client wants and enter them into the box above. Note that there are several ways to complete this case study. However, please bear in mind two pieces of general guidance:Consider how you can use flags to help you to provide the metrics required by the customer.Consider using the Logic Check processor to assess the value in multiple flags simultaneously.

LAB A11: Transformation Case Studies1) Change negative balances to ZeroCapture any records from the Customers table that have a negative balance. Ensure that the negative balance is replaced with zero.2) The 1,000 ClubEstablish which customers have made at least one payment of 1,000 or more since 1st Jan 2000. Store the details in a new Staged Data object named 1000 Club (so that it could be used in a new process elsewhere). Include their title, their name, the business name (if there is one), their email address, the payment amount and the payment date.3) Determine the Best and Worst CreditorsCreate a text output file that includes an attribute to show the time taken between a purchase order being raised and payment by the customer.Prior to creating the new attribute disregard any records where a payment has not been received from the customer (i.e. the last payment date is before the last purchase order raised date or there is no last payment date). Also remove any records that have a last payment date of midnight on 23rd Dec 2010 (these are deliberately spiked records for an earlier exercise). For the purposes of this exercise assume that the last payment made by the customer is for the invoice created when the last purchase order was raised.4) Customers with Missing or Invalid PostcodesCheck customer records for valid post codes in either Postcode column or Address3. For those records that have no valid postcode in either attribute, export the list of records to an Excel file noting Title, Name, Email Address and Phone Number. Add the Area Code to the phone number. For those with no area code mark the record with N/A in the Phone Number column.5) Calculate Balance Transform and Profile Case StudyIn the Profile module we noted that the Balance attribute in the customers table of the Service Management database appeared to contain a large number of zero values. In this case study, your task is to calculate the balance from the source data in order to compare your results to the balances presented in the customers table. In the Service Management database, the WorkorderInvoice table contains the amount invoiced for each workorder, and the Payments table contains the payment amount for each workorder. In order to calculate the balance for each customer you must: Sum the invoice amounts for each customers workorders. Sum the payment amounts for each customers workorders. Subtract the total payment amount from the total invoice amount for each customer.You results should: Show the balance rounded to the nearest two decimal places. Display the frequency with which each distinct balance amount occurs. Display the distribution of the balance in number bands. Include all of the workorders and purchase order numbers and workorder dates required associated with each customers balance. Be presented in a single spreadsheet.Your process should be automated so that it can be run regularly each night in order to calculate customers balances as new invoices are raised and new payments are received.You may find it useful to include the following processors in your solution: Lookup and Return Group and MergeLab A12: Installing Oracle Enterprise Data Quality On WindowsInstall Oracle Enterprise Data Quality on WindowsIn order to install Oracle Enterprise Data Quality you will need to have local administrator rights over your machine.You can download Oracle Enterprise Data Quality from https://edelivery.oracle.com or alternatively it may have been provided to you on a memory stick or another medium by your instructor.If your Enterprise Data Quality release came in a zip file, extract the files contents.Open the folder containing the Enterprise Data Quality release and launch the installer .exe by double-clicking dnDirectorSetup.exe.

Note that if you are installing on a Windows Vista or Windows 7 machine, you may need to right click the dnDirectorSetup.exe and select the Run as administrator option.Click Next.In the Functional Packs dialog, select all of the check boxes and click Next>.Note that you should select all functional packs so that all EDQ functionality is available to you during training. However, outside of the training environment, a customers legal license will determine which packs they should select.In the Choose Install Location dialog click Install. Do not change the Destination Folder.The green progress bar will show that Oracle Enterprise Data Quality is installing. This typically takes several minutes.If a Windows Firewall dialog box appears saying that some features of the program have been blocked, click Unblock.Click the Finish button to end the installation.Create a shortcut for the Oracle Enterprise Data Quality LaunchpadThe Launchpad is a web page provided by the Datanomic web server. Depending on the set-up of the system there will be a number of links available to the user to launch different parts of the software. The URL of the Launchpad will depend on the location of your webserver and the port number used on the server machine.Select Enterprise Data Quality / Enterprise Data Quality Launchpad from the Start Menu to bring up the Launchpad.You may find it useful to note the URL in the box below for future reference:

In your web browser, mark the Launchpad web page as a favorite or bookmark it so you can easily return to it in future.

Oracle Enterprise Data Quality Demonstration AssetsVersion 6.08.2012PrerequisiteIn order to run the demos outlined in this document you must first download and install one of the following EDQ VM images: EDQ-JUN2012 EDQ Suite VM with Demos VM Image without Address Verification EDQ-JUN2012 EDQ Suite VM With AV and Demos VM Image with Address Verification Part 1: EDQ-JUN2012 EDQ Suite with AV and Demos Part 2: EDQ-JUN2012 EDQ Suite with AV and Demos Part 3: EDQ-JUN2012 EDQ Suite with AV and Demos Part 4: EDQ-JUN2012 EDQ Suite with AV and Demos Part 5: EDQ-JUN2012 EDQ Suite with AV and DemosIf you are installing the EDQ VM Image with Address Verification (AV) youll need to download all 5 parts into a single directory and then unzip the combined files utilizing an unzip utility. The unzip utility will automatically combine the 5 separate files into a single VM image file.Instructions for downloading and installing the VM images are available on the Oracle Retriever site: http://retriever.us.oracle.com/apex/f?p=121:22:882419017231029::NO:RP:P22_CONTAINER_ID,P22_PREV_PAGE:56547,2271096.If you have trouble downloading the VM Images using IE please try the Firefox browser. Once you have successfully installed the VM image and loaded the Enterprise Data Quality Launchpad you can access all the demo assets.Demo AssetsThe EDQ corporate team (Product Management, Product Strategy, Engineering, and NAA Sales Consulting) has created a series of demo projects that will enable the sales organization to demonstrate all the functional capabilities available within the EDQ product family. The following screen shot shows the demo projects currently available within the EDQ VM Demo Instances. Each Project has been created to demonstrate a specific set of functionality requirements. Here is a brief description of each project:Address Verification Basic DemoThis project is made up of three processes and is intended to provide a basic Address Verification Demo. The demo comes with 127,147 addresses across 172 countries.The following outlines each of the three processes: Process 0 Extract Addresses: This is a simple process that allows the user to extract address records by country to help focus the demo on the correct local. The user can identify one or more countries for the address extraction process. Process 1 Verify Addresses Batch: This is a process to run the extracted addresses in Process 0 through the EDQ AV API with no additional processing. Process 2 Summarize Address Verification Batch: This process summarizes AV results and publishes to the DashboardCheck Digit ProcessorsThis project contains the following three processors: ISBN Check Digit Check validates 10 digit ISBN numbers (can end in X). Use the UPC check for 13 digit ISBNs. Luhn Check Digit Check validates numbers using a Luhn Mod 10 DAD (Double Add Double) check digit, such as Credit Cards, Canadian SIN (Social Insurance Number), US NPI (National Provider Identifier when prefixed with 80840). UPC Check Digit Check validates a range of item codes including UPC-A, EAN-13, SSC-14 / GTIN, SSCC-18, EAN-8, ISBN-13, BLN (Bill of Lading Number).Cloud Auto DemoThis project is an end to end demonstration utilizing an Auto Dealership company that has customers, sites (Dealership Locations), and products (Cars & Tires). The demonstration assumes a business and technical audience and includes the following items: Oracle products included in the demonstration: Enterprise Data Quality Profile and Audit Enterprise Data Quality Match and Merge Enterprise Data Quality Parsing and Standardization Enterprise Data Quality Product Data Parsing and Standardization Enterprise Data Quality Address Verification (World Data Pack Installed)Data domains will include Customer, Site, and Product records: Customer Data: 5,438 records Site Data: 1,406 records Product Data: 5,705 recordsFunctional Capabilities: Data Profiling Data Transformation and Standardization Matching and De-duplication Address Verification Attribute Extraction and StandardizationThe following outlines each of the six processes: Cloud Auto Customer: This process will profile, transform, validate, and enrich the customer records. In addition the system will perform an Address Verification procedure on the data before its been cleansed and then will perform another Address Verification procedure after that customer data has been standardized and enriched. The purpose of this process is to demonstrate how EDQ works with party data. Cloud Auto Dealerships: This process will profile, transform, validate, and enrich the dealership/ site records. In addition the system will perform an Address Verification procedure at the end of the standardization and enrichment procedures. The purpose of this process is to demonstrate how EDQ works with site or supplier data. EDQP Standardize and Classify Auto Records: This process will utilize a customized processor to parse the information within an intelligent part number (Vehicle Identification Number VIN) to enhance the available product information. The system will then pass the unstructured product information to the EDQP module where the following activities will occur: The system will automatically classify the products according to a user defined classification structure. Based on the product category assigned to each product the system will perform an attribute extraction and standardization procedure for each of the category specific attributes. The system will then profile the information returned from the EDQP processor. Note in this process the category specific attributes will be returned in a value/pair format. An output file will be created from this process that will be used in the EDQP Extract Auto Attributes process. EDQP Extract Auto Attributes: This process will use the classified records from the EDQP Standardize and Classify Auto Records process as an input for extracting and profiling the category specific attributes. Each unique category will have a different set of attributes and will therefore have to be processed independently. Find a Car: This process uses the cleansed dealership and product records to match an incoming file (Reference Data: Cars To Match) for possible matches. The goal of this process is to allow a customer to submit a list of cars to be matched against the available inventory. The system will return all possible matches found within the product repository. It uses a Data interface on the Reader so it will run against a batch file when executed directly from the process window. When the job Start Real Time Find a Car is executed it runs the same process but instead of processing the batch file the system is executed as a real time web service. The SC can access the Web Service Tester application and submit single or multi-row requests. For example, type in Honda Civic black and hit the submit button to see the results. Find a Dealer: This process uses the cleansed dealership records to match an incoming file (Reference Data: Find Dealership Input Data) for possible matches. The goal of this process is to allow a customer to submit a list of car makes and location information to be matched against the available dealerships across the United States. The system will return all possible matches found within the dealership repository. It uses a Data interface on the Reader so it will run against a batch file when executed directly from the process window. When the job Start Real Time Find a Dealership is executed it runs the same process but instead of processing the batch file the system is executed as a real time web service. The SC can access the Web Service Tester application and submit dealership requests. Results Book Cloud Auto Results: Each process executed as part of the Cloud Auto Demo will write different tabs into this result book. This is a great way to show the results to a prospect at the end of the demo.Customer Data Demo US OnlyExample project built as a basic customer data demonstration. Leverages sample customer data which contains dummy names and can be used in customer demonstrations.EDQ-EDQP Integrated Demo - CompleteExample project built to demonstrate the direct integration between EDQ and EDQP. The demo showcases EDQPs ability to extract, standardize, and classify product information while leveraging the EDQ matching capabilities to identify functional equivalent products.Supplier Management DemoThis project highlight how EDQ can be used to identify duplicate supplier records across multiple countries contained within a single supplier repository. The demonstration assumes a business and technical audience and includes the following items: Oracle products included in the demonstration: Enterprise Data Quality Profile and Audit Enterprise Data Quality Match and Merge Enterprise Data Quality Parsing and Standardization Data domain includes 8,550 Supplier Records Functional Capabilities: Data Profiling Data Transformation and Standardization Matching and De-duplicationThe following outlines each of the processes: Supplier Standardization Demo: This process creates a reference data file that will be used standardize the many different variants of a supplier into one common standard. The process leverages the EDQ match capabilities and match review application to automate the variant identification process. The output file (Supplier Standardization Table) is used in the next process to improve the de-duplication efforts. Supplier Deduplication Demo: This process will profile, transform, validate, enrich and de-duplicate the supplier records. This process leverages the EDQ De-Duplication match processor to identify duplicate suppliers within a file and then creates a single gold record for each of the duplicate supplier record groups. The final output of this process is a consolidated (De-Duplicated) supplier master file that has been enriched and standardized.Transliteration DemoA standard project for EDQ 9.0 or later that provides advanced transliteration of individual and entity names. This includes the use of the Oracle EDQ language packs for Arabic, Japanese, Korean, Chinese and Russian, and the basic transliterator for other writing systems.Web Service Demo - GenderAn example web service process that allows the user to parse names and check genders.Web Service Demo (Address Verification Real-Time Demonstration)This demonstration has been created to showcase how the EDQ Address Verification system can be integrated into a browser based verification process. It allows a user to submit a real-time address verification request and maps the results using Google Maps.The following three jobs must be started before a user can execute the web based demo: Run Cleaning Service Run Dashboard Service Run Matching ServiceOnce all three jobs are running the system is ready to start the web based application. From the EDQ Launchpad, start the Web Service Demo. Once the system loads the web page you can enter in an individual name and address and the system will perform a screening on the name and verify the address. This application provides a list of pre-populated individuals at the top of the screen that you can elect to demonstrate the capabilities.

Date post:	30-Oct-2015
Category:	Documents
Upload:	miloni-mehta
View:	591 times
Download:	5 times

EDQ Fundamentals for Demoing Lab v9.0

Documents