8/10/2019 Dm 1205datastageopsdb1 PDF
1/22
Copyright IBM Corporation 2012 Trademarks
Get started with the IBM InfoSphere DataStage and QualityStage
Operations Console Database, Part 1: An introduction
Page 1 of 22
Get started with the IBM InfoSphere DataStage and
QualityStage Operations Console Database, Part 1: An
introduction
A deep dive into the key relationships of the schema that defines the
Operations Database
Len Greenwood([email protected])DataStage Core Architect
IBM
Arron Harden([email protected])
Senior Software Engineer
IBM
Geoff McClean([email protected])
Senior Software Developer
IBM
Sumit Kumar([email protected])
Senior Software Developer
IBM
17 May 2012
This article is a deep dive into the schema of the IBM InfoSphere DataStage and
QualityStage Operations Database, and the tables and columns that make up its key
relationships. Specimen SQL queries are included to demonstrate how data can be read
from these tables to answer specific operational questions. You can adapt these to build, for
example, custom reports based on the operational data collected at your particular DataStage
and QualityStage installation.
View more content in this series
Introduction
The DataStage and QualityStage Operations Database (DSODB) is designed as a relational
schema whose tables contain information about job runs and system resources used on a system
http://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/developerworks/views/data/libraryview.jsp?search_by=infosphere+datastage+operations+consolemailto:[email protected]:[email protected]:[email protected]:[email protected]://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml8/10/2019 Dm 1205datastageopsdb1 PDF
2/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 2 of 22
that has a DataStage engine installed. This information is used to drive the Operations Console,
but can also be queried directly given some knowledge of the key columns of the tables and the
relationships among them.
This article describes the main tables and views of the DSODB, including all columns used
as primary or natural keys, and all foreign key relationships defined. It gives examples of SQLSELECT statements, which can be used to answer specific questions based on the information
held by DSODB.
Online reference material for DSODB 8.7, including descriptions of all current tables and columns,
can be found in the schema document reference in the Resourcessection.
Table diagrams
Some pictures are included to illustrate the relationships between various tables. Each box
represents a table or view. Relevant column names are shown, with icons against each column to
show their use as follows (Figure 1).
Figure 1. Key to types of column
Note that only the columns of each table that are mentioned in the text of this article are shown.
For a complete list of table columns, consult the schema document reference in the Resources
section.
The arrows between tables show the direction of relationships established by foreign keys and
SQL views, as shown next (Figure 2).
Figure 2. Key to relationships
The HOST table
The HOST table serves to partition the whole schema so information collected from multiple
engines can be stored in a single database schema if required (Figure 3).
8/10/2019 Dm 1205datastageopsdb1 PDF
3/22
ibm.com/developerWorks/ developerWorks
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 3 of 22
Figure 3. HOST table keys
Each row in the HOST table represents a system on which all or part of an InfoSphere Information
Server engine is installed. The HOSTID column is a surrogate primary key; The HostName column
is the system name as derived from the hostnameoperating system command or the HA_ALIAS
environment variable, if that is set in the environment of the EngMonApp process when it starts.
For systems acting as a conductor node and on which an instance of the EngMonApp process
is running, the InstallationDir column is set to a path name. This is the home directory for the
Information Server engine installation, such as /opt/IBM/InformationServer/Server/DSEngine. The
key of the HOSTS table is the combination of HostName and InstallationDir, which allows for more
than one engine to be installed on the same host system.
The CreatedTimestamp column is the UTC time at which EngMonApp inserted the row. Each
time a ResMonApp instance starts, it looks to see if there is a row in the table with the values of
HostName and InstallationDir it requires, and if so, it updates the MonStartTimestamp column;
otherwise, it creates a new row. ResMonApp also creates new HOST entries, as required.
Remote nodes have InstallationDir set to a single hyphen character, '-'. These are systems whose
HostNames have been defined in the DSODBConfig.cfg file via the ResourceNode property. Theserows have a CreatedTimestamp, but no MonStartTimestamp entry.
Cascading delete constraints are set so that deleting a row in HOST removes all rows that point
back to it. Therefore, records of any jobs that ran on that system and all resource usage records
are deleted at the same time.
Tables related to job runs
This section describes the tables used to record job run information in the default configuration.
Note that all such rows ultimately relate to an entry in the HOSTS table (Figure 4). To completely
identify a run via a SQL SELECT, so it is necessary to join on the HOSTID column if there are runsfrom more than one engine recorded in the database. (The SQL examples below assume that is
the case.)
8/10/2019 Dm 1205datastageopsdb1 PDF
4/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 4 of 22
Figure 4. Key columns and relationships between tables related to JOBRUN
The JOBEXEC table
Every time a new instance of a DataStage and QualityStage job run starts, it creates an initial
event describing which version of a job executable has started, when, and what its invocation ID
string is in the case of a multi-instanceable job. The JOBEXEC table holds details about each
unique job executable for which a run has been seen. Note that this might not be the same list as
would be found by querying the Metadata Repository; a job therein might never have been run, or
it might no longer exist if it has been deleted from the repository since it last ran.
Rows have a surrogate key JOBID used to relate all runs and other details of this particular versionof the job. Job names are unique within a DataStage project, and project names are unique for
a given host. Whenever a job is compiled, the job's executable has potentially been changed,
so one can distinguish between versions of a job's executable by including the compilation time.
Hence, the unique primary key for the table is a combination of the ProjectName, JobName, and
CompilationTimestamp columns, combined with the HOSTID as a foreign key to the HOST table to
identify the system where the project is located.
These rows are only created once. Each run of a job either finds an existing row put there by a
previous run of the same version of the job or creates a new one.
Listing 1. SQL Example:Querying the JOBEXEC table-- List names and locations of all job sequences that have
-- ever been run on host H.
SELECT
X.ProjectName, X.FolderPath, X.JobName, X.CompilationTimestamp
FROM
DSODB.JOBEXEC AS X
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND X.JobType = "SEQ"
ORDER BY X.ProjectName, X.FolderPath, X.JobName, X.CompilationTimestamp
8/10/2019 Dm 1205datastageopsdb1 PDF
5/22
ibm.com/developerWorks/ developerWorks
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 5 of 22
Cascading delete constraints are set so that deleting an entry in JOBEXEC removes all records to
do with any runs of that version of the job.
Listing 2. SQL Example:Deleting from the JOBEXEC table
-- Delete all monitored information for all jobs that have
-- been run from project P on host H.DELETE
FROM
DSODB.JOBEXEC
WHERE
JOBID IN (
SELECT
X.JOBID
FROM
DSODB.JOBEXEC AS X
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND X.ProjectName = "P"
)
The JOBRUN table
One row is inserted in this table for each monitored run of a particular version of a job. It is updated
as the job run progresses. The RUNID column is a surrogate primary key, used to relate run
details from other tables (such as job parameters and log messages). The JOBID column is a
foreign key to the row in the JOBEXEC table that contains the description of the exact version of
the job executable that is running which host is started on, in which project, the job's name,
and its compilation time. The other columns that form the unique primary key are InvocationId
and CreationTimestamp. The former is only relevant for jobs with the multi-instanceable property
set; this allows more than one instance of a job to be running at once, as long as each instance
has a different invocation ID, which is an arbitrary string assigned by the user at runtime. Notethat if a run does not have an invocation ID string, the value is set to a single hyphen. The
CreationTimestamp is the UTC time when the monitoring system created the first run event. Note
that this might be different from the RunStartTimestamp, in the case of a run that gets queued. The
RunStartTimestamp is the UTC time that the engine really launched the run, as opposed to when it
was first submitted for running.
If a run is started from a job sequence, or via job control from another job, the
CONTROLLING_RUNID column contains a foreign key to another entry in JOBRUN, which is the
run that started this one. Therefore, all runs started by a particular instance of a sequence can be
related by their common CONTROLLING_RUNID, and these potentially form a tree-structured setof relationships if job sequences are nested.
As a run progresses, it generates other events when its status changes or when it produces new
values for some of its monitored properties. These cause its row in JOBRUN to be updated, and
the LastUpdateTimestamp column records the UTC time that last happened.
A note about run statuses: there are two columns in the JobRun table that hold status values:
RunMajor Status and RunMinorStatus. The major status can be used to quickly filter those runs
that have finished from those that are starting up or still running. The minor status gives the
8/10/2019 Dm 1205datastageopsdb1 PDF
6/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 6 of 22
specific status for the run (see the schema document reference in the Resourcessection for all
possible values).
Listing 3. SQL Example:Querying the JOBRUN table - 1-- List start and finish times, names, and statuses of all jobs that have
-- been run on host H, started after time YYYY-MM-DD HH:MM:SS,-- and have finished.
SELECT
X.ProjectName, X.JobName,
R.RunStartTimestamp, R.RunEndTimestamp, R.RunMinorStatus
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND R.RunMajorStatus = "FIN"
AND R.RunStartTimestamp >= "YYYY-MM-DD HH:MM:SS"
ORDER BY R.RunStartTimestamp
Listing 4. SQL Example:Querying the JOBRUN table - 2-- List names of all jobs on host H that were running at time T and sort them
-- in descending order of total CPU usage for the whole run.
SELECT
X.ProjectName, X.FolderPath, X.JobName,
R.RunStartTimestamp, R.RunEndTimestamp, R.TotalCPU
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND R.RunStartTimestamp = "YYYY-MM-DD HH:MM:SS"
OR R.RunEndTimestamp IS NULL )
ORDER BY R.TotalCPU DESC
Cascading delete constraints are set so that deleting an entry in JOBRUN deletes all related
details of the run (parameters, logs, metrics, etc.). If the run started other runs (if it was a job
sequence, for example), all those runs are also deleted, along with their details.
Listing 5. SQL Example:Deleting from the JOBRUN table-- Delete all job runs from project P on host H
-- that finished normally.
DELETE
FROM
DSODB.JOBRUN
WHERE
RUNID IN (
SELECT
R.RUNID
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND X.ProjectName = "P"
AND R.RunMinorStatus = '"FOK"
)
8/10/2019 Dm 1205datastageopsdb1 PDF
7/22
8/10/2019 Dm 1205datastageopsdb1 PDF
8/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 8 of 22
Listing 7. SQL Example:Querying the JOBRUNLOG table
-- List job details and message info for all runs on host H that have
-- emitted fatal messages since time YYYY-MM-DD HH:MM:SS.
SELECT
X.ProjectName, X.JobName, R.RunStartTimestamp,
L.LogTimestamp, L.MessageId, L.MessageText
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBRUNLOG AS L ON R.RUNID = L.RUNID
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND L.LogType = "FAT"
AND R.RunStartTimestamp > "YYYY-MM-DD HH:MM:SS"
ORDER BY R.RunStartTimestamp, R.RUNID, L.EventID
Constraints are set so that deleting an entry in JOBRUN deletes all corresponding entries in
JOBRUNLOG.
The PARALLELCONFIG table and PARALLELCONFIGNODES view
Parallel job runs always specify a path to a file containing configuration information for the
parallel engine (via the APT_CONFIG_FILEenvironment variable). The path is recorded in the
ConfigFileName column of the JOBRUN table. However, that path might point to a temporary file,
or the contents of the file my have changed since the job ran. So the monitoring system also reads
and parses the file and stores information about how many nodes the run was configured for in the
PARALLELCONFIG table.
Since many runs can use the same configuration file, or at least files whose parallelization
instructions are equivalent, the schema saves space by making all runs with equivalentconfigurations point to the same row. So the CONFIGID column in this table is a surrogate key and
the CONFIGID column on the JOBRUN table is a foreign key to it. The PARALLELCONFIG table
also has HOSTID as a foreign key, so configurations are partitioned by host that the run started on.
The list of physical and logical nodes in the configuration is held as an XML column. Use the
PARALLELCONFIGNODES view to query the physical name of each node in the configuration and
the number of logical nodes assigned to each. (Note that the NodeListHash column is used simply
to find whether an existing row with the same configuration exists, since the XML column cannot
be used directly in a lookup.)
8/10/2019 Dm 1205datastageopsdb1 PDF
9/22
ibm.com/developerWorks/ developerWorks
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 9 of 22
Listing 8. SQL Example:Querying the PARALLELCONFIGNODES view-- Get the names of all job runs started on host H that used a node named N
-- and ran between times YYYY-MM-D1 HH:MM:SS and YYYY-MM-D2 HH:MM:SS.
SELECT
R.RunStartTimestamp, X.ProjectName, X.JobName
FROM
DSODB.JOBRUN AS R
JOIN DSODB.PARALLELCONFIGNODES AS N ON R.CONFIGID = N.CONFIGIDJOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
WHERE
H.HostName = 'H'
AND N.PhysicalName = "N"
AND R.RunStartTimestamp >= "YYYY-MM-D1 HH:MM:SS"
AND R.RunEndTimestamp
8/10/2019 Dm 1205datastageopsdb1 PDF
10/22
8/10/2019 Dm 1205datastageopsdb1 PDF
11/22
ibm.com/developerWorks/ developerWorks
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 11 of 22
Figure 6. Key columns and relationships between tables related to
JOBRUNSTAGE and JOBRUNLINK
The JOBSTAGE table
A row appears in this table for each stage to be reported on for a job run. It is created if it does
not exist and relates back to the JOBEXEC entry for the current run via its JOBID foreign key. The
entry contains static information about the stage, such as name, description and stage type. Note
that since stage names are only unique within the top level of the job, or a container if they are
at a lower level, there is a ContainerPath column that might need to be used to fully identify thestage as well as the StageName column. Each row is given a surrogate key so it can be joined to
its links.
The JOBLINK table
This contains rows for each link determined to be a source or target link for an overall job that
has been run. It uses foreign key FROMSTAGEID and TOSTAGEID to point back to the row in
JOBSTAGE table that describes the stage it connects to in its role as a source or target. This
depends on the setting of the IsSource and IsTarget columns; in general, only one is set and only
one of the foreign keys is correspondingly related.
The LinkName is unique, in combination with FROMSTAGEID/TOSTAGEID, since a stage cannot
have two links with the same name.
The JOBRUNSTAGE table
Every run that is monitored in this mode generates one row per stage, to record, among other
things, the number of CPU milliseconds that can be ascribed to the stage. A parallel job might
be running each stage in a partitioned mode, which involves more than one process. Therefore,
the NumInstances column might be greater than 1, and the InstanceCPUList might be a comma-
8/10/2019 Dm 1205datastageopsdb1 PDF
12/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 12 of 22
separated string of numbers rather than a single value. But it is easier to select the TotalCPU
column, which gives the overall value by summing entries in the list, if necessary.
Listing 11. SQL Example:Querying the JOBRUNSTAGE table -- List the stages of jobs run on host H in descending order of CPU used.
SELECT X.ProjectName, X.JobName, R.RunStartTimestamp,
JS.ContainerPath, JS.StageName, RS.TotalCPU
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
JOIN DSODB.JOBSTAGE AS JS ON X.JOBID = JS.JOBID
JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID
WHERE
H.HostName = "H"
ORDER BY RS.TotalCPU DESC
The JOBRUNLINK table
Every run of a job that has source or target links also generates one row per such link. Thisrecords the number of rows that passed down the link (these rows contribute to the "consumed" or
"produced" total row counts if the link is marked as a source or target, respectively). As for stages,
parallelization is recorded via a comma-separated list of row counts per partition, where relevant,
and the TotalRows column is always the sum over all partitions.
Listing 12. SQL Example:Querying the JOBRUNLINK table -- List the links, and the stages to which they are attached, of jobs
-- run on host H in descending order of number of rows produced.
SELECT
X.ProjectName, X.JobName, R.RunStartTimestamp,
JS.ContainerPath, JS.StageName, JL.LinkName, RL.TotalRows
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
JOIN DSODB.JOBSTAGE AS JS ON X.JOBID = JS.JOBID
JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID
JOIN DSODB.JOBLINK AS JL ON
(JS.STAGEID = JL.FROMSTAGEID OR JS.STAGEID = JL.TOSTAGEID)
JOIN DSODB.JOBRUNLINK AS RL ON R.RUNID = RL.RUNID
WHERE
H.HostName = "H"
AND JL.IsTarget = 1
ORDER BY RL.TotalRows DESC
The DATALOCATOR tableA link can also record "locator" information. This involves providing eight strings that identify the
location of an external data resource. So for a sequential file stage, these strings should identify a
file-system path name; for a database table, this should be a database type, name, schema, and
table name. Exact details depend on the stage type. For each unique set of identifiers, there is a
separate row in the DATALOCATOR table, with a surrogate key in the LOCATORID column. Each
link that has such information attempts to see if a row with the same identifiers has been created
and creates it if not. The LOCATORID column in the JOBRUNLINK table is then a foreign key to
that row.
8/10/2019 Dm 1205datastageopsdb1 PDF
13/22
ibm.com/developerWorks/ developerWorks
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 13 of 22
Table 1. The DATALOCATOR columns are used as follows
Column name Usage
ComputerName Name of system that hosts the resource
SoftwareProductName Name of product that manages the resource
DataStoreSubClass Overall type of the data store
DataStoreName Overall name of database or path to top level of resource
DataSchemaSubClass Type of next level of resource if applicable
DataSchemaName Name of next level of resource if applicable
DataCollectionSubClass Type of lowest level of resource
DataCollectionName Name of lowest level of resource, such as table or file
Listing 13. SQL Example:Querying the DATALOCATOR table
-- List the links of all runs that have written to a database named D,
-- what the table names were and how many rows were written.
SELECT
X.ProjectName, X.JobName, R.RunStartTimestamp,
JS.ContainerPath, JS.StageName, JL.LinkName,
DL.DataCollectionName AS TableName, RL.TotalRows
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
JOIN DSODB.JOBSTAGE AS JS ON X.JOBID = JS.JOBID
JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID
JOIN DSODB.JOBLINK AS JL ON
(JS.STAGEID = JL.FROMSTAGEID OR JS.STAGEID = JL.TOSTAGEID)
JOIN DSODB.JOBRUNLINK AS RL ON R.RUNID = RL.RUNID
JOIN DSODB.DATALOCATOR AS DL ON RL.LOCATORID = DL.LOCATORID
WHERE
H.HostName = "H" AND JL.IsTarget = 1
AND DL.DataStoreName = "D"
ORDER BY TableName, RL.TotalRows DESC, R.RunStartTimestamp
Table for host system descriptions
Note that rows are only recorded in the HOSTDETAIL table (Figure 7) when the engine
is configured to run resource monitoring; that is, ResourceMonitor is not set to 0 in
DSODBConfig.cfg.
Figure 7. Key columns and relationships between the HOST and HOSTDETAIL
tables
8/10/2019 Dm 1205datastageopsdb1 PDF
14/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 14 of 22
The HOSTDETAIL table
Every row in the HOST table should have at least one row from the HOSTDETAIL table
related to it. These rows contain information about the operating system properties that
were in force at some time when an instance of the ResMonApp process started. The
CreatedTimestamp column is the UTC time when the particular set of properties was first inserted.The LastCheckedTimestamp is the UTC time when an instance of ResMonApp last determined
that the properties of that particular system had not changed since the previous check (i.e., none
of the values in any of the other columns needed to be changed). If any column would require a
different value, a new row is inserted, rather than the exiting row being updated. The new row has
its CreatedTimestamp and LastCheckedTimestamp set to "now." The rows form a timeline that can
be used to track changes to the operating system properties of a particular node.
The entries in HOSTDETAIL are related to the HOST table by two foreign keys. HOSTID is
the primary key of the HOST row that identifies which host name this set of detail records
describes. HEAD_HOSTID is the primary key of the HOST row that identifies the host on which
the instance of ResMonApp that inserted the detail record is running. In the case of a conductor
node, HOSTID and HEAD_HOSTID are the same; for a remote node, they are different. In the
latter case, the detail record is recording the properties of a remote node as seen from one
particular conductor node. Note that the unique primary key for this table is the combination of
HOSTID, HEAD_HOSTID, and CreatedTimestamp, since any change to columns other than
LastCheckedTimestamp results in a new row being inserted, rather than this row being updated.
Listing 14. SQL Example:Querying the HOSTDETAIL table 1
-- List operating system name and version for host H as they were at time
-- YYYY-MM-DD HH:MM:SS (this means finding the row whose CreatedTimestamp
-- is nearest to that time).
SELECT
Max(HD.CreatedTimestamp) AS "Last Changed",
HD.PlatformName, HD.PlatformVersion
FROM
DSODB.HOSTDETAIL AS HD
JOIN DSODB.HOST AS H ON HD.HEAD_HOSTID = H.HOSTID
WHERE
H.HostName = "H"
AND HD.CreatedTimestamp
8/10/2019 Dm 1205datastageopsdb1 PDF
15/22
ibm.com/developerWorks/ developerWorks
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 15 of 22
Tables related to system resource usageThese tables hold information about how much CPU, memory, or other system resource is in
use at particular times (Figure 8). Note that rows are only recorded in the RESOURCESNAP
and RESOURCEUSAGE tables when the engine is configured to run resource monitoring (i.e.,
ResourceMonitor is not set to 0 in DSODBConfig.cfg).
Figure 8. RESOURCESNAP and RESOURCEUSAGE tables; key columns and
related views
The RESOURCESNAP table
This table only contains one row per host that is being monitored from a particular engine. ItsHOSTID column is a foreign key to an entry in the HOST table to identify which system's resources
are described. The HEAD_HOSTID column in RESOURCESNAP is also defined as a foreign key
that identifies which engine system inserted the row. In the case of a conductor node, HOSTID,
and HEAD_HOSTID are the same; for a remote node, they are different.
On each engine (corresponding to a HEAD_HOSTID value), for each system it is monitoring
including itself (the HOSTID values), a row is updated at intervals containing all the system
resource usage information being collected for that system. The LastUpdateTimestamp column
tells you when the last update took place.
The content of RESOURCESNAP should always be queried through either of the views
RESOURCESNAPSYSTEM or RESOURCESNAPDISKS.
For the full set of columns available via these views, please see the schema document reference
in the Resourcessection.
The RESOURCESNAPSYSTEM viewThis view presents several columns that hold various values for CPU, free memory, process
counts, and paging. The names of the columns are used to group them counters. So CPUPctxxxx
8/10/2019 Dm 1205datastageopsdb1 PDF
16/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 16 of 22
refers to percentage of CPU used; MemFreeKBxxxx refers to free memory in kilobytes;
ProcNumxxxx refers to number of processes; PageNumxxxx refers to number of paging events.
For example, CPUPctUser is the percentage of the CPU assigned to user processes in the interval
leading up to LastUpdateTimestamp; MemFreeKBPhysical is the number of kilobytes of physical
memory that were free in that period.
The RESOURCESNAPDISKS view
This view expands a repeating field in RESOURCESNAP, which is controlled by the number of
disk paths, if any, that have been entered in the DSODBConfig.cfg configuration file (properties
ResourceLocalFS and ResourceRemoteFS, which can be repeated). The DiskPathMonitored
column identifies each file system path, and the DiskTotalKB and DiskFreeKB columns give the
total number of kilobytes and number currently unused for the disk mounted on that path.
The RESOURCEUSAGE table
This contains counters derived from those in the RESOURCESNAP table, aggregated overintervals and arranged as a timeline. As an entry in RESOURCESNAP is updated, its values are
used to calculate the latest maximum, minimum, and average numbers over the most recent time
period, then at intervals a row is inserted to describe the behavior over that period. Using the
default configuration settings the RESOURCESNAP table is updated every 10 seconds, and every
60 seconds, a new row is inserted in RESOURCEUSAGE containing the maximum, minimum,
and averages of the last six updates for that combination of HOSTID and HEAD_HOSTID. The
StartTimestamp column of this table can be used to sort the entries into time order.
The RESOURCEUSAGESYSTEM view
For each column of RESOURCESNAP that holds system counters, there are three columns,with the suffixes Avg, Max, and Min. So CPUPctUserAvg is the average value derived
form the CPUPctUser column, and MemFreeKBPhysicalMin is the minimum value of the
MemFreeKBPhysical column over the period as recorded between StartTimestamp and
EndTimestamp.
Listing 16. SQL Example:Querying the RESOURCEUSAGESYSTEM view
-- List average user CPU percentage usage for engine system H
-- between times YYYY-MM-D1 HH:MM:SS and YYYY-MM-D2 HH:MM:SS.
SELECT
RS.StartTimestamp, RS.CPUPctUserAvg
FROM DSODB.RESOURCEUSAGESYSTEM AS RS
JOIN DSODB.HOST AS H ON RS.HEAD_HOSTID = H.HOSTID
WHERE
H.HostName= "H"
AND RS.StartTimeStamp >= "YYYY-MM-D1 HH:MM:SS"
AND RS.EndTimeStamp
8/10/2019 Dm 1205datastageopsdb1 PDF
17/22
ibm.com/developerWorks/ developerWorks
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 17 of 22
Listing 17. SQL Example:Querying the RESOURCEUSAGEDISKS view
-- List any periods where the disk free KB for any paths
-- for host N as monitored from host H fell below 1 MB.
SELECT
RD.DiskPathMonitored, RD.StartTimestamp, RD.EndTimestamp,
RD.DiskFreeKBMin
FROM DSODB.RESOURCEUSAGEDISKS AS RD
JOIN DSODB.HOST AS H ON RD.HEAD_HOSTID = H.HOSTID
JOIN DSODB.HOST AS H2 ON RD.HOSTID = H2.HOSTID
WHERE
H.HostName = "H"
AND H2.HostName = "N"
AND RD.DiskFreeKBMin < 1024
Using the lookup tables to expand codes
There are a number of tables with columns that hold enumerated code values, designed as three-
character strings that are all uppercase ASCII and vaguely mnemonic (Figure 9). These can be
expanded into a more readable form if required by using the MASTERREF table and the viewsbuilt from it.
Figure 9. MASTERREF table; key columns and related views
The MASTERREF table
This contains a row for each distinct value of each enumerated type. The combination of the
Enumeration and Code columns forms the primary key that can be used by the following views to
look up a specific code value. The Name and Description columns then give you correspondinglylonger strings that can be used in reports instead of the code value itself.
The xxxxREF Views
Each view corresponds to a table and column that contains a code, as shown in Table 2.
Table 2. Reference views and the columns they describe
View name Table Column
8/10/2019 Dm 1205datastageopsdb1 PDF
18/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 18 of 22
RUNMAJORSTATUSREF JOBRUN RunMajorStatus
RUNMINORSTATUSREF JOBRUN RunMinorStatus
RUNTYPEREF JOBRUN RunType
JOBTYPEREF JOBEXEC JobType
LOGTYPEREF JOBRUNLOG LogType
STAGESTATUSREF JOBRUNSTAGE StageStatus
LINKTYPEREF JOBLINK LinkType
For example, if a row of the JOBRUN table contains "FIN"in the RunMajorStatus column, this can
be looked up via the RUNMAJORSTATUSREF view as Name = "Finished" and Description = "Run
has finished". A RunMinorStatus of "FWF"can be looked up via RUNMINORSTATUSREF as Name
= "Finished aborted" and Description = "Run has finished and logged at least one fatal
message".
Listing 18. SQL Example:Querying the JOBRUN table using lookups againstreference views
-- List the job names , job types, run types, and minor status names
-- of all runs on host H that have finished.
SELECT
X.ProjectName, X.JobName,
JT.JobTypeName, RT.RunTypeName,
R.RunStartTimestamp, R.RunEndTimestamp,
SR.MinorStatusName
FROM
DSODB.JOBRUN AS R
JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID
JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID
JOIN DSODB.JOBTYPEREF AS JT ON X.JobType = JT.JobTypeCode
JOIN DSODB.RUNTYPEREF AS RT ON R.RunType = RT.RunTypeCode JOIN DSODB.RUNMINORSTATUSREF AS SR ON R.RunMinorStatus = SR.MinorStatusCode
WHERE
H.HostName = "H"
AND R.RunMajorStatus = "FIN"
ORDER BY R.RunStartTimestamp
Conclusion
We have examined how the tables that make up the Operations Database schema contain
rows that describe job runs, host systems, and system resource usage. We have described the
identifying key fields for the main tables, some of the other important fields, and how foreign
key fields are used to join other tables to them. We have given several examples of real-worldquestions that can be answered by appropriate use of these fields.
Acknowledgements
Thanks go to the members of the DataStage Operations Console development team who ensured
that the schema got ported to the various flavors of database supported: DB2 LUW 9.5 and 9.7,
Oracle 10g and 11gR2, Microsoft SQL Server 2005 and 2008, on the various Windows and
UNIX platforms.
8/10/2019 Dm 1205datastageopsdb1 PDF
19/22
8/10/2019 Dm 1205datastageopsdb1 PDF
20/22
developerWorks ibm.com/developerWorks/
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 20 of 22
Resources
Learn
The operations database schema is documented on the IBM InfoCenter.
IBM InfoSphere Information Server 8.7 What's New whitepaper, containing details of theOperations Console.
IBM DB2 9.7 for Linux, UNIX, and Windows Information Centerand SQL reference.
Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation, how-to articles, education, downloads, product
information, and more.
Stay current with developerWorks technical events and webcasts.
Follow developerWorks on Twitter.
Get products and technologies
Build your next development project with IBM trial software, available for download directly
from developerWorks.
Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2
Express Edition for the community that offers the same core data features as DB2 Express
Edition and provides a solid base to build and deploy applications.
Discuss
Check out the developerWorks blogsand get involved in the developerWorks community.
http://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/downloads/im/udbexp/index.htmlhttp://www.ibm.com/developerworks/downloads/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/techbriefings/http://www.ibm.com/developerworks/data/http://www.ibm.com/developerworks/data/http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.sql.ref.doc%2Fdoc%2Fc0004100.htmlhttp://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsphttp://public.dhe.ibm.com/common/ssi/ecm/en/imw14590usen/IMW14590USEN.PDFhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v8r7/topic/com.ibm.swg.im.iis.ds.monitor.ref.doc/topics/opsdb.html8/10/2019 Dm 1205datastageopsdb1 PDF
21/22
ibm.com/developerWorks/ developerWorks
Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 21 of 22
About the authors
Len Greenwood
Len Greenwood was a member of the small development team that produced the first
version of DataStage in 1996, prior to it being acquired from Ascential Software byIBM in 2005. It now forms a mainstay of the IBM InfoSphere Information Server suite.
He has worked in the related areas of data and metadata integration for the past
15 years and is currently the main product architect for the core components of the
DataStage and QualityStage development and production tools. He recently designed
the database schema that underlies the Information Server Operations Console, used
to monitor activity at the DataStage engine level.
Arron Harden
Arron Harden is a senior software engineer for IBM InfoSphere DataStage and
QualityStage. Staying with the DataStage product after several mergers and
acquisitions, he has worked on DataStage for more than 12 years, joining IBM
through the acquisition of Ascential Software Inc in 2005. Having spent a year
working in Boston, he is currently based in the United Kingdom, working at the IBM
Milton Keynes office. In his most recent role, he was the lead developer for the web
application component of the DataStage and QualityStage Operations Console,
written using the Dojo toolkit.
Geoff McClean
Geoff McClean was on the original DataStage development team at its inception and
is currently a senior software developer for core components of the IBM InfoSphere
DataStage and QualityStage development and production tools, part of the IBM
InfoSphere Information Server suite. He oversaw the implementation of the database
management, event handling, and resource tracking services of the IBM InfoSphere
DataStage and QualityStage Operations Console.
Sumit Kumar
Sumit Kumar has 13 years of industry experience, including the financial, banking,
telecom, supply-chain management, insurance, and healthcare domains. He has
worked with IBM InfoSphere Information Server from early 2010, including the
Operations Console feature of Information Server, from the beginning of its design
8/10/2019 Dm 1205datastageopsdb1 PDF
22/22