+ All Categories
Home > Documents > Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Date post: 14-Jun-2015
Category:
Upload: djkucera
View: 466 times
Download: 0 times
Share this document with a friend
Popular Tags:
31
Database M M IGRATING IGRATING A D D ATA ATA W W AREHOUSE AREHOUSE FROM FROM M M ICROSOFT ICROSOFT SQL S SQL S ERVER ERVER TO TO O O RACLE RACLE 11 11G Dylan Kucera, Senior Manager – Data Architecture Ontario Teachers’ Pension Plan INTRODUCTION IT infrastructure is often sized according to a 3 to 5 year growth projection which can generally be understood as a sensible and cost effective practice. A sign of success for any deployment is when demand begins to outstrip the capability of the technology after this time period has past. When looking at the organization’s central Data Warehouse, a DBA or senior technology architect may foresee the need for a stronger technology capability, however, management may not be so easily convinced. Furthermore, core services such as an organization’s central Data Warehouse can be difficult to replace with a different vendor solution once dozens or even hundreds of mission critical applications are wired to the existing deployment. With patience, extensive metrics gathering, and a strong business case, management buy-in may be attainable. This paper outlines a number of hints that are worth considering while crafting a business case for a Data Warehouse migration to present to IT management. Oracle Database 11g provides a number of key technologies that allow for a gradual Data Warehouse migration strategy to unfold over a period of staged deployments, minimizing and in some cases completely eliminating disruption to the end-user experience. The main purpose of this paper is to outline these technologies and how they can be employed as a part of the migration process. This paper is also meant to point out a number of pitfalls within these technologies, and how to avoid them. GAINING BUY-IN FOR A DATA WAREHOUSE MIGRATION When it comes to pitching a Data Warehouse migration, patience isn’t just a virtue, it is a requirement. Go in expecting the acceptance process to take a long time. 1 Session 387
Transcript
Page 1: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

MMIGRATINGIGRATING AA D DATAATA W WAREHOUSEAREHOUSE FROMFROM M MICROSOFTICROSOFT SQL SQL SSERVERERVER TOTO O ORACLERACLE 11 11GG

Dylan Kucera, Senior Manager – Data ArchitectureOntario Teachers’ Pension Plan

INTRODUCTION IT infrastructure is often sized according to a 3 to 5 year growth projection which can generally be understood as a sensible and cost effective practice. A sign of success for any deployment is when demand begins to outstrip the capability of the technology after this time period has past. When looking at the organization’s central Data Warehouse, a DBA or senior technology architect may foresee the need for a stronger technology capability, however, management may not be so easily convinced. Furthermore, core services such as an organization’s central Data Warehouse can be difficult to replace with a different vendor solution once dozens or even hundreds of mission critical applications are wired to the existing deployment.

With patience, extensive metrics gathering, and a strong business case, management buy-in may be attainable. This paper outlines a number of hints that are worth considering while crafting a business case for a Data Warehouse migration to present to IT management.

Oracle Database 11g provides a number of key technologies that allow for a gradual Data Warehouse migration strategy to unfold over a period of staged deployments, minimizing and in some cases completely eliminating disruption to the end-user experience. The main purpose of this paper is to outline these technologies and how they can be employed as a part of the migration process. This paper is also meant to point out a number of pitfalls within these technologies, and how to avoid them.

GAINING BUY-IN FOR A DATA WAREHOUSE MIGRATION When it comes to pitching a Data Warehouse migration, patience isn’t just a virtue, it is a requirement. Go in expecting the acceptance process to take a long time.

Armed with the knowledge that you will return to this topic with the management group a number of times, plan for these iterations and make your first presentation about your timeline for evaluation. Unfold your message in a staged fashion.

Start thinking about metrics before anything else; management understands metrics better than technology. The challenge with this, however, is to make sure the metrics encompass the entire problem at hand. If the bar is set too low because your metrics do not capture the full

1 Session 387

Page 2: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

scope of your Data Warehouse challenges, your goal of beginning a migration path may be compromised as management may not understand the severity or urgency of the issues.

Remember to tie every message to management about the Data Warehouse to business requirements and benefits. As a technologist you may naturally see the benefit to the business, but don’t expect management to make this leap with you. Highlight ways in which the technology will help meet service levels or business goals.

2 Session 387

Page 3: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

ARTICULATING BENEFITS OF A DATA WAREHOUSE MIGRATION

Benefits of a Data Warehouse migration must be tailored to the specific business requirements of the organization in question. There are a number of general areas where Oracle Database 11g is particularly strong and will likely stand out as providing significant benefit in any circumstance.

Scalability

While RAC will be the obvious key point around scalability, RAC is actually only part of the solution. Consider how the Locking Model in Microsoft SQL Server enforces one uncommitted writer to block all readers of the same row until the writer commits. Oracle Database 11g on the other hand allows readers to proceed with reading all committed changes up to the point in time where their query began. The latter is the more sensible behaviour in a large scale Data Warehouse. Microsoft SQL Server will choose to perform a lock escalation during periods of peak load, in the worst case scenario causing an implicit lock of the temporary space catalog, effectively blocking all work until this escalation is cleared. Oracle on the other hand has no such concept of escalating a row level lock. Again, the latter behaviour will provide superior service in a busy multi-user Data Warehouse. Also evaluate the mature Workload Balancing capabilities of Oracle Database 11g to allow preferential treatment of priority queries.

Availability

RAC is of course the key availability feature in Oracle 11g. Be sure to also consider Flashback capabilities which can allow for much faster recovery from data corruption than a traditional backup/restore model. Evaluate other availability issues in your environment; for example, perhaps your external stored procedures crash your SQL Server because they run in-process unlike Oracle extprocs which run safely out-of-process.

Environment Capability

PL/SQL is a fully featured language based on ADA and as such may simplify development within your environment. The Package concept allows for code encapsulation and avoids global namespace bloat for large and complex solutions. Advanced Data Warehousing features such as Materialized Views may greatly simplify your ETL processes and increase responsiveness and reliability.

Maintainability

Oracle Enterprise Manager is a mature and fully featured management console capable of centralizing the management of a complex data warehouse infrastructure. Your current environment may involve some amount of replication that was put in place to address scalability. Consider how RAC could lower maintenance costs or increase data quality by eliminating data replication.

Fit with strategic deployment

Perhaps your organization is implementing new strategic products or services that leverage the Oracle Database. Should this be the case, be sure to align your

3 Session 387

Page 4: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

recommendations to these strategies as this could be your strongest and best understood justification.

4 Session 387

Page 5: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

EXECUTING A DATA WAREHOUSE MIGRATION If you are lucky enough to manage a Data Warehouse that has a single front-end such as an IT managed Business Intelligence layer, then it may be possible for you to plan a “Big Bang” migration. More likely, however, your Data Warehouse has dozens or hundreds of direct consumers, ranging from Business Unit developed Microsoft Access links to complex custom legacy applications. Given this circumstance, a phased migration approach must be taken over a longer period of time. You will need to expose a new Data Warehouse technology that can be built against while continuing to support the legacy Data Warehouse containing a synchronized data set. This paper outlines three Oracle Database capabilities that are key to the success of a seamless large-scale Data Warehouse Migration: Oracle Migration (Workbench), Transparent Gateway (Data Gateway as of Oracle 11g), and Oracle Streams Heterogeneous Replication.

ORACLE MIGRATION (WORKBENCH)The Oracle Migration features of Oracle’s SQL Developer (formerly known as Oracle Migration Workbench, hereafter referred to as such for clarity) can help fast-track Microsoft Transact-SQL to Oracle PL/SQL code migration. Be aware though that a machine will only do a marginal job of translating your code. The translator doesn’t know how to do things “Better” with PL/SQL than was possible with Transact-SQL. The resulting code product almost certainly will not conform to your coding standards in terms of variable naming, formatting, or syntax. You need to ask yourself the difficult question as to whether the effort or time saved in using Oracle Migration Workbench is worth the cost of compromising the quality of the new Data Warehouse code base.

Executing the Oracle Migration Workbench is as simple as downloading the necessary Microsoft SQL Server JDBC driver, adding it to Oracle SQL Developer, creating a connection to the target SQL Server, and executing the “Capture Microsoft SQL Server” function, as shown below

Figure 1 : Oracle Migration – Capturing existing code

5 Session 387

Page 6: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Once Oracle captures the model of the target SQL Server, you will be able to view all of the Transact-SQL code. The sample below shows a captured Transact-SQL stored procedure that employs a temporary table and uses a number of Transact-SQL functions such as “stuff” and “patindex”

Figure 2 : Oracle Migration – Existing code captured

Using Oracle Migration Workbench to convert this Transact-SQL to Oracle PL/SQL produces this result:

Figure 3 : Oracle Migration – Translated Code

Notice that the Temporary table is converted to the necessary DDL that will create the analog Oracle Global Temporary Table. The name, however, may be less than desirable because tt_ as a prefix does not necessarily conform to your naming standards. Furthermore, the Global Temporary Table is now global to the target schema and should probably have a better name

6 Session 387

Page 7: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

than the Transact-SQL table “Working” which was isolated to the scope of the single stored procedure. Also notice that because there are often subtle differences in the built-in Transact-SQL functions as compared to similar PL/SQL functions, the Oracle Migration Workbench creates a package of functions called “sqlserver_utilities” to replicate the behaviour of the Transact-SQL functions precisely. Again, this might not be the best choice for a new code base.

7 Session 387

Page 8: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Oracle Migration Workbench can also be used to migrate tables, data, and other schema objects. Taking this approach however considerably limits your ability to rework the data model in the new Oracle Data Warehouse. Using Oracle Migration Workbench to migrate tables and data is not well suited to a “Parallel support” model where both the legacy Data Warehouse as well as the new Oracle Data Warehouse will be kept in sync as applications are migrated. The remaining sections of this paper describe an alternate approach to table and data migration that provides a seamless and paced migration path.

TRANSPARENT GATEWAY (DATA GATEWAY)Oracle Transparent Gateway (branded Data Gateway as of 11g; this paper uses Transparent Gateway to avoid confusion with the general word “Data”) is an add-on product for Oracle Database that provides access to foreign data stores via Database Links. Transparent Gateway is similar to Heterogeneous Services (included as part of the base Oracle Database license), however, Transparent Gateway is built for specific foreign targets and as such enables features not available in Heterogeneous Services such as foreign Stored Procedure calls and Heterogeneous Streams Replication.

VIEWS EMPLOYING TRANSPARENT GATEWAY

One tool that can be used to fast-track the usefulness of your new Oracle Data Warehouse is to employ Views that link directly to the legacy data store. This approach can be used for key tables that will require some planning and time to fully migrate, and yet the availability of these tables will greatly influence the adoption of the new Oracle Data Warehouse. Once you consider the new table name and column names that meet with your standards, a View can be created similar to the following example:

CREATE OR REPLACE VIEW PLAY.VALUE_TABLE_SAMPLE AS

SELECT

"IDENTIFIER" AS ID_,

"VALUE" AS VALUE_,

FILE_DATE AS FILE_DATE

FROM

SampleLegacyTable@MSSQL

Figure 4 : Oracle View to Legacy table

Perhaps your naming standards suggest that the prefix VIEW_ should be used for all Views. Keep in mind though that this View is destined to become a physical table on Oracle once the data population process can be moved and a synchronization strategy employed. This paper will assume some sort of ETL process is used for data population, but even transactional tables can be considered for a staged migration using this approach so long as the locking model of the legacy system is considered carefully.

PITFALLS OF TRANSPARENT GATEWAY (VIEWS)When developing queries against a View that uses Transparent Gateway such as the one shown above, it is important to remember that these Views are meant as a stop-gap measure.

8 Session 387

Page 9: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Creating complex queries against these sorts of Views is a risky venture. For example, consider the following query:

9 Session 387

Page 10: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

DECLARE

tDate DATE := '2008-12-31';

BEGIN

INSERT INTO PLAY.TEMP_SAMPLE_7445

(ID_, NAME_, PREV_VALUE, CURR_VALUE,

VALUE_SUPPLIER, DATE_VALUE_CHANGED)

SELECT ID_, '', '', '', 'SAMPLE', MAX(FILE_DATE)

FROM PLAY.VALUE_TABLE_SAMPLE

WHERE FILE_DATE <= tDate

GROUP BY ID_;

END;

Figure 5 : Complex use of View can cause internal errors

This query, because it inserts to a table, selects a constant, uses an aggregate function, filters using a variable, and employs a Group By clause, throws an ORA-03113: end-of-file on communication channel error (Alert log shows ORA-07445: exception encountered: core dump [intel_fast_memcpy.A()+18] [ACCESS_VIOLATION] [ADDR:0x115354414B] [PC:0x52A9DFE] [UNABLE_TO_READ] [])

While this particular problem is fixed in Oracle Database 11.1.0.6 patch 10 and 11.1.0.7 patch 7, getting this patch from Oracle took several months. The example is meant to illustrate that queries of increased complexity have a higher likelihood of failing or hanging. Keeping this in mind, Views over Transparent Gateway can be a powerful tool to bridge data availability gaps in the short-term.

STORED PROCEDURES EMPLOYING TRANSPARENT GATEWAY

In a similar vain to creating pass-through Views to quickly expose legacy data to the Oracle Data Warehouse, Stored Procedure wrappers can be created to provide an Oracle PL/SQL entry point for legacy stored procedures. This method can be particularly useful in preventing the creation of new application links directly to stored procedures within the legacy Data Warehouse when it is not possible to immediately migrate the logic contained within the stored procedure.

Consider the following Microsoft Transact-SQL stored procedure:

CREATE PROCEDURE dbo.GetScheduleForRange

@inStartDate DATETIME,

@inEndDate DATETIME

AS

SELECT DATE, DURATION, SESSION_ID, TITLE

FROM NorthWind..COLLABSCHED

WHERE DATE BETWEEN @inStartDate AND @inEndDate

Figure 6 : Transact SQL procedure

10 Session 387

Page 11: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

11 Session 387

Page 12: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

The following PL/SQL wrapper produces a simple yet effective Oracle entry point for the legacy procedure above:

CREATE OR REPLACE PROCEDURE PLAY.RPT_COLLABORATE_SCHEDULE_RANGE (

inStart_Date DATE,

inEnd_Date DATE,

RC1 IN OUT SYS_REFCURSOR) IS

tRC1_MS SYS_REFCURSOR;

tDate DATE;

tDuration NUMBER;

tSession_ID NUMBER;

tTitle VARCHAR2(256);

BEGIN

DELETE FROM PLAY.TEMP_COLLABORATE_SCHEDULE;

dbo.GetScheduleForRange@MSSQL(inStart_Date, inEnd_Date, tRC1_MS);

LOOP

FETCH tRC1_MS INTO tDate, tDuration, tSession_ID, tTitle;

EXIT WHEN tRC1_MS%NOTFOUND;

BEGIN

INSERT INTO PLAY.TEMP_COLLABORATE_SCHEDULE

(DATE_, DURATION, SESSION_ID, TITLE)

VALUES(tDate, tDuration, tSession_ID, tTitle);

END;

END LOOP;

CLOSE tRC1_MS;

OPEN RC1 FOR

SELECT DATE_, DURATION, SESSION_ID, TITLE

FROM PLAY.TEMP_COLLABORATE_SCHEDULE

ORDER BY SESSION_ID;

END RPT_COLLABORATE_SCHEDULE_RANGE;

Figure 7 : PL/SQL wrapper for legacy procedure

Regardless of the complexity of the body of the Transact-SQL stored procedure, a simple wrapper similar to the one above can be created using only the knowledge of the required parameters, the structure of the result set and a simple 5 step formula:

1. Declare Variables for all Transact-SQL Result set columns

2. Call Transact-SQL Procedure

12 Session 387

Page 13: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

3. Fetch Result one row at a time

4. Insert row to Oracle Temporary Table

5. Open Ref Cursor result set

PITFALLS OF TRANSPARENT GATEWAY (STORED PROCEDURES)Oracle Data Gateway for Microsoft SQL Server version 11.1.0.6 for Windows 32-bit contains a rather serious bug with respect to calling remote stored procedures that return result sets and actually attempting to retrieve the contents of the result set. Calling the procedure above using an ODBC driver:

{CALL PLAY.RPT_COLLABORATE_SCHEDULE_RANGE('2009-05-06 12:00:00', '2009-05-06 17:00:00')}

Results in ORA-06504: PL/SQL: Return types of Result Set variables or query do not match. This bug is not fixed until 11.1.0.7 Patch 7 which needs to be applied to the Gateway home (assuming the Gateway is installed in a different Oracle home than the Database).

ORACLE STREAMS AS AN ENABLER OF MIGRATION

Proxies for Views and Stored Procedures like the ones shown above can be helpful in making your new Oracle Data Warehouse useful in the early stages of a migration effort. How can you then begin to migrate tables and data to Oracle while still providing a transition period for applications where data is equally available in the legacy Data Warehouse? In any case you will need to start by developing a new load (ETL) process for the Oracle Data Warehouse. Perhaps you could just leave the old ETL process running in Parallel. Employing this approach, reconciliation would be a constant fear unless you have purchased an ETL tool that will somehow guarantee both Data Warehouses are loaded or neither is loaded. A more elegant approach that won’t overload your ETL support people is to employ Oracle Streams Heterogeneous Replication.

Oracle Streams combined with Transparent Gateway allows for seamless Heterogeneous Replication back to the legacy Data Warehouse. Using this approach, the Data Warehouse staff need build and support only one ETL process, and DBA’s support Oracle Streams like any other aspect of the Database Infrastructure.

ORACLE STREAMS – IF WE BUILD IT, WILL THEY COME?Old habits die hard for Developers and Business users. Legacy systems have a way of surviving for a long time. How can you motivate usage of the new Oracle Data Warehouse?

A strong set of metadata documentation describing how the new model replaces the old model will be a key requirement in helping to motivate a move toward the new Data Warehouse. Easy to read side by side tables showing the new vs. old data structures will be welcomed by your developers and users. Try to make these available in paper as well as online form just to make sure you’ve covered everyone’s preference in terms of work habits. Be prepared to do a series of road-shows to display the new standards and a sample of the metadata. You will need to commit to keeping this documentation up to date as you grow your new Data Warehouse and migrate more of the legacy.

13 Session 387

Page 14: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Occasionally your development group will find that it can no longer support a legacy application because it is written in a language or manner that no one completely understands any more. You need to make sure standards are put in place early and have your Architecture Review staff enforcing that the newly designed and engineered application must access only the new Data Warehouse. Try to prioritize your warehouse migration according to the data assets this re-engineered application requires to minimize exceptions and/or many Views and Stored Procedure proxies.

Some Data Warehouse access will never be motivated to migrate by anything other than a grass roots effort from the Data Warehouse group. You may find that Business Unit developed applications have this characteristic. You should be planning for a certain amount of Data Warehouse staff time that will be spent with owners of these (often smaller departmental) solutions to help users re-target their data access to the new Data Warehouse.

14 Session 387

Page 15: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Once in a while, a project will be sponsored that requires a significant overhaul of an application; so much so that the effort is essentially a full re-write. Much like the circumstance of the development group refreshing the technology behind an application, you want to be sure that the right members of the project working group are aware of the new Data Warehouse standards. You should try to help them understand the benefits to the project in order to create an ally in assuring that the proper Warehouse is targeted.

Finally, completely new solutions will be purchased or built. You should aim to be in the same position to have these deployments target the new Oracle Data Warehouse as described in some of the situations above.

ORACLE STREAMS – BUILDING A HETEROGENEOUS STREAM

When building a Heterogeneous Streams setup, the traditional separated Capture and Apply model must be used. Much can be learned about the architecture of Oracle Streams by reading the Oracle Streams Concepts and Administration manual. In a very small nutshell, the Capture Process is responsible for Mining the archive logs and finding/queueing all DML that needs to be sent to the legacy Data Warehouse target. The Apply Process takes from this queue and actually ships the data downstream to the legacy target.

In general, Streams is a very memory hungry process. Be prepared to allocate 2 to 4 gigabytes of memory to the Streams Pool. Explicitly split your Capture and Apply processes over multiple nodes if you are employing RAC in order to smooth the memory usage across your environment. The value that Streams will provide to your Data Warehouse migration strategy should hopefully pay for the cost of the memory resources it requires.

ORACLE STREAMS – CAPTURE PROCESS AND RULES

The Capture process is created the same way as any Homogeneous capture process would be and is well described in the manual Oracle Streams Concepts and Administration. This paper will therefore not focus on the creation of the Capture process further, except to show a script that can be used to create an example Capture process called “SAMPLE_CAPTURE” and a Capture rule to capture the table “PLAY.COLLABORATE_SCHEDULE”:BEGIN

DBMS_STREAMS_ADM.SET_UP_QUEUE(

queue_table => 'SAMPLE_STREAM_QT',

queue_name => 'SAMPLE_STREAM_Q',

queue_user => 'STRMADMIN'

);

END;

/

BEGIN

DBMS_CAPTURE_ADM.CREATE_CAPTURE(

queue_name => 'SAMPLE_STREAM_Q',

capture_name => 'SAMPLE_CAPTURE',

15 Session 387

Page 16: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

capture_user => 'STRMADMIN',

checkpoint_retention_time => 3

);

END;

/

Figure 8 : Oracle Streams – Standard Capture

16 Session 387

Page 17: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

BEGIN

DBMS_STREAMS_ADM.ADD_TABLE_RULES(

table_name => 'PLAY.COLLABORATE_SCHEDULE',

streams_type => 'CAPTURE',

streams_name => 'SAMPLE_CAPTURE',

queue_name => 'SAMPLE_STREAM_Q',

include_dml => true,

include_ddl => false,

include_tagged_lcr => false,

inclusion_rule => true

);

END;

/

Figure 9 : Oracle Streams – Standard Capture Rule

ORACLE STREAMS – TRANSPARENT GATEWAY CONFIGURATION

Before you begin building the Streams Apply process, a Transparent Gateway Database Link must first be in place. The recommended configuration is to create a separate Database Link for your Streams processes even if you have a Database Link available to applications and users to the same remote target. Doing so allows you to use different permissions for the Streams user (eg. The Streams link must be able to write to remote tables while Applications must not write to these same tables or the replication will become out of sync!), and also provides flexibility in configuring or even upgrading and patching the gateway for Streams in a different way than the gateway for applications and users.

Creating and configuring the Database Link for Streams is therefore like any other Database Link, except we will make it owned by the database user STRMADMIN. This example shows a link named MSSQL_STREAMS_NORTHWIND that links to the SQL Server Northwind database on a server named SQLDEV2:

#

# HS init parameters

#

HS_FDS_CONNECT_INFO=SQLDEV2//Northwind

HS_FDS_TRACE_LEVEL=OFF

HS_COMMIT_POINT_STRENGTH=0

HS_FDS_RESULTSET_SUPPORT=TRUE

HS_FDS_DEFAULT_OWNER=dbo

Figure 10 : Text file “initLDB_STREAMS_NORTHWIND.ora”

CREATE DATABASE LINK MSSQL_STREAMS_NORTHWIND

CONNECT TO STRMADMIN IDENTIFIED BY ********

17 Session 387

Page 18: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

USING 'LDB_STREAMS_NORTHWIND’;

Figure 11 : DDL to create Database Link MSSQL_STREAMS_NORTHWIND

18 Session 387

Page 19: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

ORACLE STREAMS – APPLY PROCESS AND RULES

The Streams Apply Process is where the work to send rows to the Heterogeneous target occurs. Each step in the Apply Process and Rules creation/configuration is worth looking at in some detail and so this paper will focus more closely on the Apply Process configuration than previous steps.

When creating a Heterogeneous Apply Process, a Database Link is named. This means that in the design of your Streams Topology, you will need to include at least one Apply Process for each “Database” on the target server. This is especially important to consider when targeting Microsoft SQL Server or Sybase, as a Database in those environments is more like a Schema in Oracle. Below is a script to create a sample Heterogeneous Apply process called “SAMPLE_APPLY_NORTHWIND”:

BEGIN

DBMS_APPLY_ADM.CREATE_APPLY(

queue_name => 'SAMPLE_STREAM_Q',

apply_name => 'SAMPLE_APPLY_NORTHWIND',

apply_captured => TRUE,

apply_database_link => 'MSSQL_STREAMS_NORTHWIND'

);

END;

/

Figure 12 : Oracle Streams – Heterogeneous Apply

In a Heterogeneous Apply situation, the Apply Table Rule itself does not differ from a typical Streams Apply Table Rule. Below is an example of an Apply Table Rule that includes the same table we captured in the sections above, PLAY.COLLABORATE_SCHEDULE, as a part of the table rules for the Apply Process SAMPLE_APPLY_NORTHWIND.

BEGIN

DBMS_STREAMS_ADM.ADD_TABLE_RULES(

table_name => 'PLAY.COLLABORATE_SCHEDULE',

streams_type => 'APPLY',

streams_name => 'SAMPLE_APPLY_NORTHWIND',

queue_name => 'SAMPLE_STREAM_Q',

include_dml => true,

include_ddl => false

);

END;

/

Figure 13 : Oracle Streams – Standard Apply Rule

19 Session 387

Page 20: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

ORACLE STREAMS – APPLY TRANSFORMS – TABLE RENAME

The Apply Table Rename transform is one of the most noteworthy steps in the process of setting up Heterogeneous streams because it is absolutely required, unless you are applying to the same schema on the legacy Data Warehouse as the schema owner of the table in the new Oracle Data Warehouse. It is more likely that you have either redesigned your schemas to be aligned with the current business model, or in the case of a Microsoft SQL Server legacy you have made Oracle Schemas out of the Databases on the SQL Server, and the legacy owner of the tables is “dbo”. You may also have wanted to take the opportunity to create the table in the Oracle Data Warehouse using more accurate or standardized names. Below is an example of an Apply Table Rename transform that maps the new table PLAY.COLLABORATE_SCHEDULE to the legacy dbo.COLLABSCHED table in the Northwind database:

BEGIN

DBMS_STREAMS_ADM.RENAME_TABLE(

rule_name => 'COLLABORATE_SCHEDULE554',

from_table_name => 'PLAY.COLLABORATE_SCHEDULE',

to_table_name => '"dbo".COLLABSCHED',

step_number => 0,

operation =>'ADD');

END;

/

Figure 14 : Oracle Streams – Apply Table Rename rule

Notice that the rule name is suffixed in this example with the number 554. This number was chosen by Oracle in the Add Table Rule step. You will need to pull this out of the view DBA_STREAMS_RULES after executing the ADD_TABLE_RULE step, or write a more sophisticated script that stores the rule name in a variable using the overloaded ADD_TABLE_RULE procedure that allows this to be obtained as an OUT variable.

One final note about the Rename Table transform: it is not possible to Apply to a Heterogeneous target table whose name is in Mixed Case. For example, Microsoft SQL Server allows for mixed case table names. You will need to have your DBA’s change the table names to upper case on the target before the Apply process will work. Luckily Microsoft SQL Server is completely case insensitive when it comes to the use of the tables, and so while changing the table names to upper case may make a legacy “Camel Case” table list look rather ugly, nothing should functionally break as a result of this change.

ORACLE STREAMS – APPLY TRANSFORMS – COLUMN RENAME

The Column Rename transform is similar in nature to the Table Rename Transform. Notice in the example below how a column is being renamed because the legacy table contains a column named “DATE” which is completely disallowed in Oracle as a column name because DATE is a key word (data type). The same restriction applies to Column names as with Table names in a

20 Session 387

Page 21: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Heterogeneous Apply configuration: All column names on the target must be in upper case. Again, this should have no impact on your legacy code as systems that allow mixed case column names such as Microsoft SQL Server are typically not case sensitive when using the column.

21 Session 387

Page 22: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

BEGIN

DBMS_STREAMS_ADM.RENAME_COLUMN(

rule_name => 'COLLABORATE_SCHEDULE554',

table_name => 'PLAY.COLLABORATE_SCHEDULE',

from_column_name => '"DATE_"',

to_column_name => '"DATE"',

value_type => '*',

step_number => 0,

operation => 'ADD');

END;

/

Figure 15 : Oracle Streams – Apply Column Rename rule

ORACLE STREAMS – EXERCISING THE STREAM

Assuming we have tables set up in both the legacy and new Data Warehouse that have only the table name and one column name difference in terms of structure, the steps above are sufficient to now put the Stream into action. Streams has no ability to synchronize tables that are out of sync. Before setting up the Stream you must ensure that the table content matches exactly. Let’s assume for now that you are starting with zero rows and plan to insert all the data after the Stream is set up. The screenshot below illustrates for this example that the legacy target table on Microsoft SQL Server is empty:

Figure 16 : Oracle Streams – Empty Microsoft SQL Server target table

Below is a rudimentary script showing the execution of some seed data being inserted into the Oracle table. You would of course want to use a more sophisticated approach such as

22 Session 387

Page 23: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

SQL*Loader, however, this sample is meant to be simple for the purposes of understanding and transparency:

23 Session 387

Page 24: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES

('2009-05-06 08:30:00',60,359,'Oracle Critical Patch Updates: Insight and Understanding');

1 row inserted

SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES

('2009-05-06 11:00:00',60,237,'Best Practices for Managing Successful BI Implementations');

1 row inserted

SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES

('2009-05-06 12:15:00',60,257,'Best practices for deploying a Data Warehouse on Oracle Database 11g');

1 row inserted

SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES

('2009-05-06 13:30:00',60,744,'Business Intelligence Publisher Overview and Planned Features');

1 row inserted

SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES

('2009-05-06 15:15:00',60,387,'Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g');

1 row inserted

SQL> INSERT INTO PLAY.COLLABORATE_SCHEDULE (DATE_,DURATION,SESSION_ID,TITLE) VALUES

('2009-05-06 16:30:00',60,245,'Data Quality Heartburn? Get 11g Relief');

1 row inserted

SQL> COMMIT;

Commit complete

Figure 17 : Oracle Streams – Inserting to the new Oracle Data Warehouse Table

Allowing the Capture and Apply processes a few seconds to catch-up, re-executing the query from above on the legacy Data Warehouse shows that the rows have been replicated through Streams to the target.

Figure 18 : Oracle Streams – Populated Microsoft SQL Server target Table

24 Session 387

Page 25: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

ORACLE STREAMS - STREAMS SPEED AND SYNCHRONIZING TABLES IN ADVANCE

While the capabilities of Oracle Streams to be able to seamlessly replicate data to a heterogeneous legacy target are phenomenal, Streams and especially Heterogeneous Streams over Transparent Gateway won’t be knocking your socks off in terms of speed. At best, with today’s hardware, you will see 500-600 rows per second flowing through to the target. In a fully built up Data Warehouse, you’re more likely to see 100-200 rows per second. Hopefully you’ll be able to engineer your ETL processes so that this limited speed won’t be an issue due to the incremental nature of the Data Warehouse. But let’s say your Data Warehouse table needs to be seeded with 2 million rows of existing data. The smarter way to start in this case is to synchronize the tables before setting up the Stream. This approach comes with some extra considerations outlined below.

ORACLE STREAMS – EMPTY STRINGS VS. NULL VALUES

Microsoft SQL Server treats empty strings as distinct from NULL values. Oracle on the other hand does not. If you synchronize your tables outside of Streams, you must ensure there are no empty strings in the Microsoft SQL Server data before doing so. If you find a column that contains empty strings, there may be some leg work required in advance to make sure there are no consuming systems that will behave differently if they see a NULL instead of an empty string.

ORACLE STREAMS – SYNCHRONIZING TABLES CONTAINING FLOATS

One of the simplest ways one can imagine to synchronize the new Oracle Data Warehouse table with the legacy table is to use an Insert/Select statement to select the data from Transparent Gateway and insert the data to the Oracle target. A set operation via Transparent Gateway will after all work orders of magnitude faster than Streams operating row by row. Unfortunately, if your data contains Float or Real columns in Microsoft SQL Server, this method will not work due to a limitation in Transparent Gateway. This limitation is best illustrated with an example. Below is a sample of a couple of floating point numbers being inserted to a Microsoft SQL Server table. Notice the final two digits of precision:

25 Session 387

Page 26: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Figure 19 : Oracle Streams – Floats in Microsoft SQL Server

Now have a look at the very same table selected via Oracle Transparent Gateway. Notice how in either case, using the default display precision or explicitly forcing Oracle to show us 24 digits of precision, the last two digits of precision are missing when compared to the Select done straight on the SQL Server above:

26 Session 387

Page 27: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Figure 20 : Oracle Streams – Floats over Transparent Gateway

A fact that is unintuitive and yet undeniably clear once you begin working with Heterogeneous Streams: the manner in which Oracle Streams uses Transparent Gateway will require the digits of precision that are missing from the Gateway Select statement. If you were to sync up the table shown above to an equivalent Oracle table using an Insert/Select over Transparent Gateway, set up a Capture and Apply process linking the tables, and finally delete from the Oracle side, the Streams Apply Process would fail with a “No Data Found” error when it went to find the SQL Server rows to delete.

The most reliable way to synchronize the two sides in preparation for Streams is to extract the rows to a comma separated value file, and then use SQL*Loader to import the data to Oracle. Below is an example of using Microsoft DTS to generate the CSV file, and beside that proof that the CSV file contains all required digits of precision:

27 Session 387

Page 28: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

Figure 21 : Microsoft DTS used to extract seed data to CSV

Figure 22 : CSV file produced by DTS contains full precision

28 Session 387

Page 29: Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Oracle 11g

Database

CONCLUSION Committing to a new Data Warehouse technology is a difficult decision for an organization to make. The effort in executing the migration is costly in terms of time and resources. Remember to respect these facts when making your case to management. Remain confident in your recommendations and plan, but unfold these in a paced fashion that allows you time to build your message and allows those around you the space to come to terms with the requirements.

While migration tools such as Oracle Migration Workbench can help with the migration of certain Data Warehouse assets, the bigger challenge comes in executing a seamless migration over a period of time. Focus on your strategy to enable a new Oracle Data Warehouse while maintaining reliable service to the legacy over your parallel period.

Employ tools such as Oracle Transparent Gateway or Oracle Heterogeneous Streams to enable your migration strategy, but be prepared to weather the storm. Because these products are more niche than the core features of the Oracle Database, limitations and product bugs will surface along the way.

Finally, old habits will be hard to break for your developers and business users. Be sure to consider the standards, metadata, education, and mentoring that your consumers will require in order to make your new Oracle Data Warehouse deployment an overwhelming success.

29 Session 387


Recommended