Oracle Data Integrator
How ODI Originated:-
• In October 2006 Sunopsis were bought by oracle and re-branded as Oracle Data Integrator (ODI).
• The main aims behind ODI was to enhance the Oracle Fusion Middleware offerings, which require support to heterogeneous sources and targets. Even after the purchase oracle continues to offer separately ODI as well as its formal ETL product (Oracle Warehouse Builder)
What is Oracle Data Integrator
Oracle Data Integrator is an ETL tool (Extract, Load and Transform). It provides a graphical interface for user to build, manage and maintain data integration processes.Oracle data Integrator provides a blend of solutions for building, deploying and managing real time data centric architectures in the SOA, BI and data warehouse environment.ODI provides:-• High performance.• High volume of data load and data transformationODI can be used to support a variety of data integration projects, like following:-• Business Intelligence and Data Warehousing• Data Migrations and Data Consolidations• Master Data Management
Oracle Data Integrator 12.1.2.0.0
• Extract: - copies the data from source system to staging area• Transform:- reformatted for the warehouse with business
calculation applied• Load:- copies from staging area into the warehouse
ETL vs ELT
Extract data from sources
SOURCE
SOURCE
SOURCE
ETL Hub Server DW Server
Extract Transform Load
Transform it to fit the need
Loading in end target database or data warehouse
ETL:
SOURCE
SOURCE
SOURCE
DW Server
ELT:
ETL : In this data extract from different sources, transformed separately and loaded to DW database. On the other end ELT, the extracted data is inserted into single database which handle both transformation and load.
• No separate transformation engine required, the work is done by the target engine itself.
• Data transformation and loading happen in parallel, so less time and resources are spent.
• ELT works with high-end data engines such as Hadoop cluster, cloud. This gives additional performance security.
Advantage of ELT:-
ODI ARCHITECTUREODI follow 4 tier architecture. It consists of:-
• Desktop• WebLogic Server• Repositories• Sources and Targets
ODI consist of following components:-
Repositories
The repository is the primary component in the ODI architecture. Basically repository contains design time objects (packages, interface or mapping), run time objects (scenarios) plus sessions.
There are two types of Repositories
Master Repository Work Repository
Master Repository
It contains information related to topology and security navigator. Topology:- It includes information related to source and target
connection like technologies, physical and logical schemas, context, agents, languages, JDBC URLs, username etc.
Security:- All the information handled by security navigator is also part of a master repository like users, profiles, access privileges.
Versions:- Whenever a new version of an object is create, master repository maintains its a record.
This repository comprises of sensitive information.
Work Repository
It manages information regarding designer and operator navigator. Information corresponding to developer activities like:- Projects i.e.
packages, procedures, variables, sequences, mapping. Scenarios, load plan, schedules, source and target metadata. Work repository cannot exist alone. Work repository depends upon
master repository for its existences. We can have multiple master repositories. And one master repository
can have multiple work repositories, but work repository can only have one and only one master repository.
ODI Studio
ODI Studio is the medium through which we can access the master and work repository.Four navigators are provided through which we can manage different aspect and steps of odi project. Designer Operator Topology Manager Security Manager
DesignDesign navigator manages model and projects. Project Development tasks are done in this navigator. In this we develop, maintain tasks graphically like mapping, packages. Moreover metadata about the projects is defined in this navigator. Codes generated in, this can be customized. Execution of the scenarios is done through this navigator i.e. load plan.
OperatorThis navigator is used in production management and monitoring. In this we can monitor step by step execution of scenario, i.e. mapping, packages and procedure or of load plan.Basically, when a scenario executed, then agent creates a session with the repository. Sessions are made up of steps and steps from tasks. Sessions are organized according to date, status, physical agent etc.Sessions are arranged in a hierarchical order leading to ease debugging.
Topology ManagerTopology navigator describes the logical and physical architecture of the information system. This navigator contains information about technologies, physical and logical schemas, context, agents and languages.
Security ManagerAs the name suggests security manager, its helps to manage the security of ODI. Creation of user profiles, roles takes place in this navigator and it also assign privileges to profiles and roles.
Run Time Agents
Agents are required for the execution of the scenarios. Agent do not contain all the information regarding sessions, but only have basic information. Other detail information about the session is stored in the repository.
So whenever scenario is executed on an agent, the agent will create sessions on the repository. So the agent will read task to do on this session from the repository, then it will process it and then write the result in the repository.
There are 2 types of agents:-
• Standalone Agent
• Java EE agent
Standalone Agent:-
Standalone agent is light weighted because it does not require application server. Due to this installation on already active or running server is easy.
For a standalone agent to work it requires a java virtual machine. It provides connectivity among work repository and to source,
target data server through JDBC.
Java EE agent:-
Since on agent whole execution process depends, so it is important that it should be up all the time, because if an agent is down and some schedule job tries to run it will fail and create an issue. So it was the main drawback of standalone agent.
Whereas in Java EE is a web application and can be deployed on several nodes. And all the schedule jobs are stored in coherence cache and if any node is down another one can perform its job. It makes use of the benefits provided by the application server.
ODI Console
It is a web based user interface where you can view and edit on topology objects, manage scenarios, monitor sessions