+ All Categories
Home > Documents > Dm 1205datastageopsdb1 PDF

Dm 1205datastageopsdb1 PDF

Date post: 02-Jun-2018
Category:
Upload: naresh-kumar
View: 334 times
Download: 21 times
Share this document with a friend

of 22

Transcript
  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    1/22

    Copyright IBM Corporation 2012 Trademarks

    Get started with the IBM InfoSphere DataStage and QualityStage

    Operations Console Database, Part 1: An introduction

    Page 1 of 22

    Get started with the IBM InfoSphere DataStage and

    QualityStage Operations Console Database, Part 1: An

    introduction

    A deep dive into the key relationships of the schema that defines the

    Operations Database

    Len Greenwood([email protected])DataStage Core Architect

    IBM

    Arron Harden([email protected])

    Senior Software Engineer

    IBM

    Geoff McClean([email protected])

    Senior Software Developer

    IBM

    Sumit Kumar([email protected])

    Senior Software Developer

    IBM

    17 May 2012

    This article is a deep dive into the schema of the IBM InfoSphere DataStage and

    QualityStage Operations Database, and the tables and columns that make up its key

    relationships. Specimen SQL queries are included to demonstrate how data can be read

    from these tables to answer specific operational questions. You can adapt these to build, for

    example, custom reports based on the operational data collected at your particular DataStage

    and QualityStage installation.

    View more content in this series

    Introduction

    The DataStage and QualityStage Operations Database (DSODB) is designed as a relational

    schema whose tables contain information about job runs and system resources used on a system

    http://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/developerworks/views/data/libraryview.jsp?search_by=infosphere+datastage+operations+consolemailto:[email protected]:[email protected]:[email protected]:[email protected]://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml
  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    2/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 2 of 22

    that has a DataStage engine installed. This information is used to drive the Operations Console,

    but can also be queried directly given some knowledge of the key columns of the tables and the

    relationships among them.

    This article describes the main tables and views of the DSODB, including all columns used

    as primary or natural keys, and all foreign key relationships defined. It gives examples of SQLSELECT statements, which can be used to answer specific questions based on the information

    held by DSODB.

    Online reference material for DSODB 8.7, including descriptions of all current tables and columns,

    can be found in the schema document reference in the Resourcessection.

    Table diagrams

    Some pictures are included to illustrate the relationships between various tables. Each box

    represents a table or view. Relevant column names are shown, with icons against each column to

    show their use as follows (Figure 1).

    Figure 1. Key to types of column

    Note that only the columns of each table that are mentioned in the text of this article are shown.

    For a complete list of table columns, consult the schema document reference in the Resources

    section.

    The arrows between tables show the direction of relationships established by foreign keys and

    SQL views, as shown next (Figure 2).

    Figure 2. Key to relationships

    The HOST table

    The HOST table serves to partition the whole schema so information collected from multiple

    engines can be stored in a single database schema if required (Figure 3).

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    3/22

    ibm.com/developerWorks/ developerWorks

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 3 of 22

    Figure 3. HOST table keys

    Each row in the HOST table represents a system on which all or part of an InfoSphere Information

    Server engine is installed. The HOSTID column is a surrogate primary key; The HostName column

    is the system name as derived from the hostnameoperating system command or the HA_ALIAS

    environment variable, if that is set in the environment of the EngMonApp process when it starts.

    For systems acting as a conductor node and on which an instance of the EngMonApp process

    is running, the InstallationDir column is set to a path name. This is the home directory for the

    Information Server engine installation, such as /opt/IBM/InformationServer/Server/DSEngine. The

    key of the HOSTS table is the combination of HostName and InstallationDir, which allows for more

    than one engine to be installed on the same host system.

    The CreatedTimestamp column is the UTC time at which EngMonApp inserted the row. Each

    time a ResMonApp instance starts, it looks to see if there is a row in the table with the values of

    HostName and InstallationDir it requires, and if so, it updates the MonStartTimestamp column;

    otherwise, it creates a new row. ResMonApp also creates new HOST entries, as required.

    Remote nodes have InstallationDir set to a single hyphen character, '-'. These are systems whose

    HostNames have been defined in the DSODBConfig.cfg file via the ResourceNode property. Theserows have a CreatedTimestamp, but no MonStartTimestamp entry.

    Cascading delete constraints are set so that deleting a row in HOST removes all rows that point

    back to it. Therefore, records of any jobs that ran on that system and all resource usage records

    are deleted at the same time.

    Tables related to job runs

    This section describes the tables used to record job run information in the default configuration.

    Note that all such rows ultimately relate to an entry in the HOSTS table (Figure 4). To completely

    identify a run via a SQL SELECT, so it is necessary to join on the HOSTID column if there are runsfrom more than one engine recorded in the database. (The SQL examples below assume that is

    the case.)

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    4/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 4 of 22

    Figure 4. Key columns and relationships between tables related to JOBRUN

    The JOBEXEC table

    Every time a new instance of a DataStage and QualityStage job run starts, it creates an initial

    event describing which version of a job executable has started, when, and what its invocation ID

    string is in the case of a multi-instanceable job. The JOBEXEC table holds details about each

    unique job executable for which a run has been seen. Note that this might not be the same list as

    would be found by querying the Metadata Repository; a job therein might never have been run, or

    it might no longer exist if it has been deleted from the repository since it last ran.

    Rows have a surrogate key JOBID used to relate all runs and other details of this particular versionof the job. Job names are unique within a DataStage project, and project names are unique for

    a given host. Whenever a job is compiled, the job's executable has potentially been changed,

    so one can distinguish between versions of a job's executable by including the compilation time.

    Hence, the unique primary key for the table is a combination of the ProjectName, JobName, and

    CompilationTimestamp columns, combined with the HOSTID as a foreign key to the HOST table to

    identify the system where the project is located.

    These rows are only created once. Each run of a job either finds an existing row put there by a

    previous run of the same version of the job or creates a new one.

    Listing 1. SQL Example:Querying the JOBEXEC table-- List names and locations of all job sequences that have

    -- ever been run on host H.

    SELECT

    X.ProjectName, X.FolderPath, X.JobName, X.CompilationTimestamp

    FROM

    DSODB.JOBEXEC AS X

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    WHERE

    H.HostName = "H"

    AND X.JobType = "SEQ"

    ORDER BY X.ProjectName, X.FolderPath, X.JobName, X.CompilationTimestamp

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    5/22

    ibm.com/developerWorks/ developerWorks

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 5 of 22

    Cascading delete constraints are set so that deleting an entry in JOBEXEC removes all records to

    do with any runs of that version of the job.

    Listing 2. SQL Example:Deleting from the JOBEXEC table

    -- Delete all monitored information for all jobs that have

    -- been run from project P on host H.DELETE

    FROM

    DSODB.JOBEXEC

    WHERE

    JOBID IN (

    SELECT

    X.JOBID

    FROM

    DSODB.JOBEXEC AS X

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    WHERE

    H.HostName = "H"

    AND X.ProjectName = "P"

    )

    The JOBRUN table

    One row is inserted in this table for each monitored run of a particular version of a job. It is updated

    as the job run progresses. The RUNID column is a surrogate primary key, used to relate run

    details from other tables (such as job parameters and log messages). The JOBID column is a

    foreign key to the row in the JOBEXEC table that contains the description of the exact version of

    the job executable that is running which host is started on, in which project, the job's name,

    and its compilation time. The other columns that form the unique primary key are InvocationId

    and CreationTimestamp. The former is only relevant for jobs with the multi-instanceable property

    set; this allows more than one instance of a job to be running at once, as long as each instance

    has a different invocation ID, which is an arbitrary string assigned by the user at runtime. Notethat if a run does not have an invocation ID string, the value is set to a single hyphen. The

    CreationTimestamp is the UTC time when the monitoring system created the first run event. Note

    that this might be different from the RunStartTimestamp, in the case of a run that gets queued. The

    RunStartTimestamp is the UTC time that the engine really launched the run, as opposed to when it

    was first submitted for running.

    If a run is started from a job sequence, or via job control from another job, the

    CONTROLLING_RUNID column contains a foreign key to another entry in JOBRUN, which is the

    run that started this one. Therefore, all runs started by a particular instance of a sequence can be

    related by their common CONTROLLING_RUNID, and these potentially form a tree-structured setof relationships if job sequences are nested.

    As a run progresses, it generates other events when its status changes or when it produces new

    values for some of its monitored properties. These cause its row in JOBRUN to be updated, and

    the LastUpdateTimestamp column records the UTC time that last happened.

    A note about run statuses: there are two columns in the JobRun table that hold status values:

    RunMajor Status and RunMinorStatus. The major status can be used to quickly filter those runs

    that have finished from those that are starting up or still running. The minor status gives the

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    6/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 6 of 22

    specific status for the run (see the schema document reference in the Resourcessection for all

    possible values).

    Listing 3. SQL Example:Querying the JOBRUN table - 1-- List start and finish times, names, and statuses of all jobs that have

    -- been run on host H, started after time YYYY-MM-DD HH:MM:SS,-- and have finished.

    SELECT

    X.ProjectName, X.JobName,

    R.RunStartTimestamp, R.RunEndTimestamp, R.RunMinorStatus

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    WHERE

    H.HostName = "H"

    AND R.RunMajorStatus = "FIN"

    AND R.RunStartTimestamp >= "YYYY-MM-DD HH:MM:SS"

    ORDER BY R.RunStartTimestamp

    Listing 4. SQL Example:Querying the JOBRUN table - 2-- List names of all jobs on host H that were running at time T and sort them

    -- in descending order of total CPU usage for the whole run.

    SELECT

    X.ProjectName, X.FolderPath, X.JobName,

    R.RunStartTimestamp, R.RunEndTimestamp, R.TotalCPU

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    WHERE

    H.HostName = "H"

    AND R.RunStartTimestamp = "YYYY-MM-DD HH:MM:SS"

    OR R.RunEndTimestamp IS NULL )

    ORDER BY R.TotalCPU DESC

    Cascading delete constraints are set so that deleting an entry in JOBRUN deletes all related

    details of the run (parameters, logs, metrics, etc.). If the run started other runs (if it was a job

    sequence, for example), all those runs are also deleted, along with their details.

    Listing 5. SQL Example:Deleting from the JOBRUN table-- Delete all job runs from project P on host H

    -- that finished normally.

    DELETE

    FROM

    DSODB.JOBRUN

    WHERE

    RUNID IN (

    SELECT

    R.RUNID

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    WHERE

    H.HostName = "H"

    AND X.ProjectName = "P"

    AND R.RunMinorStatus = '"FOK"

    )

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    7/22

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    8/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 8 of 22

    Listing 7. SQL Example:Querying the JOBRUNLOG table

    -- List job details and message info for all runs on host H that have

    -- emitted fatal messages since time YYYY-MM-DD HH:MM:SS.

    SELECT

    X.ProjectName, X.JobName, R.RunStartTimestamp,

    L.LogTimestamp, L.MessageId, L.MessageText

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.JOBRUNLOG AS L ON R.RUNID = L.RUNID

    JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    WHERE

    H.HostName = "H"

    AND L.LogType = "FAT"

    AND R.RunStartTimestamp > "YYYY-MM-DD HH:MM:SS"

    ORDER BY R.RunStartTimestamp, R.RUNID, L.EventID

    Constraints are set so that deleting an entry in JOBRUN deletes all corresponding entries in

    JOBRUNLOG.

    The PARALLELCONFIG table and PARALLELCONFIGNODES view

    Parallel job runs always specify a path to a file containing configuration information for the

    parallel engine (via the APT_CONFIG_FILEenvironment variable). The path is recorded in the

    ConfigFileName column of the JOBRUN table. However, that path might point to a temporary file,

    or the contents of the file my have changed since the job ran. So the monitoring system also reads

    and parses the file and stores information about how many nodes the run was configured for in the

    PARALLELCONFIG table.

    Since many runs can use the same configuration file, or at least files whose parallelization

    instructions are equivalent, the schema saves space by making all runs with equivalentconfigurations point to the same row. So the CONFIGID column in this table is a surrogate key and

    the CONFIGID column on the JOBRUN table is a foreign key to it. The PARALLELCONFIG table

    also has HOSTID as a foreign key, so configurations are partitioned by host that the run started on.

    The list of physical and logical nodes in the configuration is held as an XML column. Use the

    PARALLELCONFIGNODES view to query the physical name of each node in the configuration and

    the number of logical nodes assigned to each. (Note that the NodeListHash column is used simply

    to find whether an existing row with the same configuration exists, since the XML column cannot

    be used directly in a lookup.)

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    9/22

    ibm.com/developerWorks/ developerWorks

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 9 of 22

    Listing 8. SQL Example:Querying the PARALLELCONFIGNODES view-- Get the names of all job runs started on host H that used a node named N

    -- and ran between times YYYY-MM-D1 HH:MM:SS and YYYY-MM-D2 HH:MM:SS.

    SELECT

    R.RunStartTimestamp, X.ProjectName, X.JobName

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.PARALLELCONFIGNODES AS N ON R.CONFIGID = N.CONFIGIDJOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    WHERE

    H.HostName = 'H'

    AND N.PhysicalName = "N"

    AND R.RunStartTimestamp >= "YYYY-MM-D1 HH:MM:SS"

    AND R.RunEndTimestamp

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    10/22

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    11/22

    ibm.com/developerWorks/ developerWorks

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 11 of 22

    Figure 6. Key columns and relationships between tables related to

    JOBRUNSTAGE and JOBRUNLINK

    The JOBSTAGE table

    A row appears in this table for each stage to be reported on for a job run. It is created if it does

    not exist and relates back to the JOBEXEC entry for the current run via its JOBID foreign key. The

    entry contains static information about the stage, such as name, description and stage type. Note

    that since stage names are only unique within the top level of the job, or a container if they are

    at a lower level, there is a ContainerPath column that might need to be used to fully identify thestage as well as the StageName column. Each row is given a surrogate key so it can be joined to

    its links.

    The JOBLINK table

    This contains rows for each link determined to be a source or target link for an overall job that

    has been run. It uses foreign key FROMSTAGEID and TOSTAGEID to point back to the row in

    JOBSTAGE table that describes the stage it connects to in its role as a source or target. This

    depends on the setting of the IsSource and IsTarget columns; in general, only one is set and only

    one of the foreign keys is correspondingly related.

    The LinkName is unique, in combination with FROMSTAGEID/TOSTAGEID, since a stage cannot

    have two links with the same name.

    The JOBRUNSTAGE table

    Every run that is monitored in this mode generates one row per stage, to record, among other

    things, the number of CPU milliseconds that can be ascribed to the stage. A parallel job might

    be running each stage in a partitioned mode, which involves more than one process. Therefore,

    the NumInstances column might be greater than 1, and the InstanceCPUList might be a comma-

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    12/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 12 of 22

    separated string of numbers rather than a single value. But it is easier to select the TotalCPU

    column, which gives the overall value by summing entries in the list, if necessary.

    Listing 11. SQL Example:Querying the JOBRUNSTAGE table -- List the stages of jobs run on host H in descending order of CPU used.

    SELECT X.ProjectName, X.JobName, R.RunStartTimestamp,

    JS.ContainerPath, JS.StageName, RS.TotalCPU

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    JOIN DSODB.JOBSTAGE AS JS ON X.JOBID = JS.JOBID

    JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID

    WHERE

    H.HostName = "H"

    ORDER BY RS.TotalCPU DESC

    The JOBRUNLINK table

    Every run of a job that has source or target links also generates one row per such link. Thisrecords the number of rows that passed down the link (these rows contribute to the "consumed" or

    "produced" total row counts if the link is marked as a source or target, respectively). As for stages,

    parallelization is recorded via a comma-separated list of row counts per partition, where relevant,

    and the TotalRows column is always the sum over all partitions.

    Listing 12. SQL Example:Querying the JOBRUNLINK table -- List the links, and the stages to which they are attached, of jobs

    -- run on host H in descending order of number of rows produced.

    SELECT

    X.ProjectName, X.JobName, R.RunStartTimestamp,

    JS.ContainerPath, JS.StageName, JL.LinkName, RL.TotalRows

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    JOIN DSODB.JOBSTAGE AS JS ON X.JOBID = JS.JOBID

    JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID

    JOIN DSODB.JOBLINK AS JL ON

    (JS.STAGEID = JL.FROMSTAGEID OR JS.STAGEID = JL.TOSTAGEID)

    JOIN DSODB.JOBRUNLINK AS RL ON R.RUNID = RL.RUNID

    WHERE

    H.HostName = "H"

    AND JL.IsTarget = 1

    ORDER BY RL.TotalRows DESC

    The DATALOCATOR tableA link can also record "locator" information. This involves providing eight strings that identify the

    location of an external data resource. So for a sequential file stage, these strings should identify a

    file-system path name; for a database table, this should be a database type, name, schema, and

    table name. Exact details depend on the stage type. For each unique set of identifiers, there is a

    separate row in the DATALOCATOR table, with a surrogate key in the LOCATORID column. Each

    link that has such information attempts to see if a row with the same identifiers has been created

    and creates it if not. The LOCATORID column in the JOBRUNLINK table is then a foreign key to

    that row.

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    13/22

    ibm.com/developerWorks/ developerWorks

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 13 of 22

    Table 1. The DATALOCATOR columns are used as follows

    Column name Usage

    ComputerName Name of system that hosts the resource

    SoftwareProductName Name of product that manages the resource

    DataStoreSubClass Overall type of the data store

    DataStoreName Overall name of database or path to top level of resource

    DataSchemaSubClass Type of next level of resource if applicable

    DataSchemaName Name of next level of resource if applicable

    DataCollectionSubClass Type of lowest level of resource

    DataCollectionName Name of lowest level of resource, such as table or file

    Listing 13. SQL Example:Querying the DATALOCATOR table

    -- List the links of all runs that have written to a database named D,

    -- what the table names were and how many rows were written.

    SELECT

    X.ProjectName, X.JobName, R.RunStartTimestamp,

    JS.ContainerPath, JS.StageName, JL.LinkName,

    DL.DataCollectionName AS TableName, RL.TotalRows

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    JOIN DSODB.JOBSTAGE AS JS ON X.JOBID = JS.JOBID

    JOIN DSODB.JOBRUNSTAGE AS RS ON JS.STAGEID = RS.STAGEID

    JOIN DSODB.JOBLINK AS JL ON

    (JS.STAGEID = JL.FROMSTAGEID OR JS.STAGEID = JL.TOSTAGEID)

    JOIN DSODB.JOBRUNLINK AS RL ON R.RUNID = RL.RUNID

    JOIN DSODB.DATALOCATOR AS DL ON RL.LOCATORID = DL.LOCATORID

    WHERE

    H.HostName = "H" AND JL.IsTarget = 1

    AND DL.DataStoreName = "D"

    ORDER BY TableName, RL.TotalRows DESC, R.RunStartTimestamp

    Table for host system descriptions

    Note that rows are only recorded in the HOSTDETAIL table (Figure 7) when the engine

    is configured to run resource monitoring; that is, ResourceMonitor is not set to 0 in

    DSODBConfig.cfg.

    Figure 7. Key columns and relationships between the HOST and HOSTDETAIL

    tables

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    14/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 14 of 22

    The HOSTDETAIL table

    Every row in the HOST table should have at least one row from the HOSTDETAIL table

    related to it. These rows contain information about the operating system properties that

    were in force at some time when an instance of the ResMonApp process started. The

    CreatedTimestamp column is the UTC time when the particular set of properties was first inserted.The LastCheckedTimestamp is the UTC time when an instance of ResMonApp last determined

    that the properties of that particular system had not changed since the previous check (i.e., none

    of the values in any of the other columns needed to be changed). If any column would require a

    different value, a new row is inserted, rather than the exiting row being updated. The new row has

    its CreatedTimestamp and LastCheckedTimestamp set to "now." The rows form a timeline that can

    be used to track changes to the operating system properties of a particular node.

    The entries in HOSTDETAIL are related to the HOST table by two foreign keys. HOSTID is

    the primary key of the HOST row that identifies which host name this set of detail records

    describes. HEAD_HOSTID is the primary key of the HOST row that identifies the host on which

    the instance of ResMonApp that inserted the detail record is running. In the case of a conductor

    node, HOSTID and HEAD_HOSTID are the same; for a remote node, they are different. In the

    latter case, the detail record is recording the properties of a remote node as seen from one

    particular conductor node. Note that the unique primary key for this table is the combination of

    HOSTID, HEAD_HOSTID, and CreatedTimestamp, since any change to columns other than

    LastCheckedTimestamp results in a new row being inserted, rather than this row being updated.

    Listing 14. SQL Example:Querying the HOSTDETAIL table 1

    -- List operating system name and version for host H as they were at time

    -- YYYY-MM-DD HH:MM:SS (this means finding the row whose CreatedTimestamp

    -- is nearest to that time).

    SELECT

    Max(HD.CreatedTimestamp) AS "Last Changed",

    HD.PlatformName, HD.PlatformVersion

    FROM

    DSODB.HOSTDETAIL AS HD

    JOIN DSODB.HOST AS H ON HD.HEAD_HOSTID = H.HOSTID

    WHERE

    H.HostName = "H"

    AND HD.CreatedTimestamp

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    15/22

    ibm.com/developerWorks/ developerWorks

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 15 of 22

    Tables related to system resource usageThese tables hold information about how much CPU, memory, or other system resource is in

    use at particular times (Figure 8). Note that rows are only recorded in the RESOURCESNAP

    and RESOURCEUSAGE tables when the engine is configured to run resource monitoring (i.e.,

    ResourceMonitor is not set to 0 in DSODBConfig.cfg).

    Figure 8. RESOURCESNAP and RESOURCEUSAGE tables; key columns and

    related views

    The RESOURCESNAP table

    This table only contains one row per host that is being monitored from a particular engine. ItsHOSTID column is a foreign key to an entry in the HOST table to identify which system's resources

    are described. The HEAD_HOSTID column in RESOURCESNAP is also defined as a foreign key

    that identifies which engine system inserted the row. In the case of a conductor node, HOSTID,

    and HEAD_HOSTID are the same; for a remote node, they are different.

    On each engine (corresponding to a HEAD_HOSTID value), for each system it is monitoring

    including itself (the HOSTID values), a row is updated at intervals containing all the system

    resource usage information being collected for that system. The LastUpdateTimestamp column

    tells you when the last update took place.

    The content of RESOURCESNAP should always be queried through either of the views

    RESOURCESNAPSYSTEM or RESOURCESNAPDISKS.

    For the full set of columns available via these views, please see the schema document reference

    in the Resourcessection.

    The RESOURCESNAPSYSTEM viewThis view presents several columns that hold various values for CPU, free memory, process

    counts, and paging. The names of the columns are used to group them counters. So CPUPctxxxx

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    16/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 16 of 22

    refers to percentage of CPU used; MemFreeKBxxxx refers to free memory in kilobytes;

    ProcNumxxxx refers to number of processes; PageNumxxxx refers to number of paging events.

    For example, CPUPctUser is the percentage of the CPU assigned to user processes in the interval

    leading up to LastUpdateTimestamp; MemFreeKBPhysical is the number of kilobytes of physical

    memory that were free in that period.

    The RESOURCESNAPDISKS view

    This view expands a repeating field in RESOURCESNAP, which is controlled by the number of

    disk paths, if any, that have been entered in the DSODBConfig.cfg configuration file (properties

    ResourceLocalFS and ResourceRemoteFS, which can be repeated). The DiskPathMonitored

    column identifies each file system path, and the DiskTotalKB and DiskFreeKB columns give the

    total number of kilobytes and number currently unused for the disk mounted on that path.

    The RESOURCEUSAGE table

    This contains counters derived from those in the RESOURCESNAP table, aggregated overintervals and arranged as a timeline. As an entry in RESOURCESNAP is updated, its values are

    used to calculate the latest maximum, minimum, and average numbers over the most recent time

    period, then at intervals a row is inserted to describe the behavior over that period. Using the

    default configuration settings the RESOURCESNAP table is updated every 10 seconds, and every

    60 seconds, a new row is inserted in RESOURCEUSAGE containing the maximum, minimum,

    and averages of the last six updates for that combination of HOSTID and HEAD_HOSTID. The

    StartTimestamp column of this table can be used to sort the entries into time order.

    The RESOURCEUSAGESYSTEM view

    For each column of RESOURCESNAP that holds system counters, there are three columns,with the suffixes Avg, Max, and Min. So CPUPctUserAvg is the average value derived

    form the CPUPctUser column, and MemFreeKBPhysicalMin is the minimum value of the

    MemFreeKBPhysical column over the period as recorded between StartTimestamp and

    EndTimestamp.

    Listing 16. SQL Example:Querying the RESOURCEUSAGESYSTEM view

    -- List average user CPU percentage usage for engine system H

    -- between times YYYY-MM-D1 HH:MM:SS and YYYY-MM-D2 HH:MM:SS.

    SELECT

    RS.StartTimestamp, RS.CPUPctUserAvg

    FROM DSODB.RESOURCEUSAGESYSTEM AS RS

    JOIN DSODB.HOST AS H ON RS.HEAD_HOSTID = H.HOSTID

    WHERE

    H.HostName= "H"

    AND RS.StartTimeStamp >= "YYYY-MM-D1 HH:MM:SS"

    AND RS.EndTimeStamp

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    17/22

    ibm.com/developerWorks/ developerWorks

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 17 of 22

    Listing 17. SQL Example:Querying the RESOURCEUSAGEDISKS view

    -- List any periods where the disk free KB for any paths

    -- for host N as monitored from host H fell below 1 MB.

    SELECT

    RD.DiskPathMonitored, RD.StartTimestamp, RD.EndTimestamp,

    RD.DiskFreeKBMin

    FROM DSODB.RESOURCEUSAGEDISKS AS RD

    JOIN DSODB.HOST AS H ON RD.HEAD_HOSTID = H.HOSTID

    JOIN DSODB.HOST AS H2 ON RD.HOSTID = H2.HOSTID

    WHERE

    H.HostName = "H"

    AND H2.HostName = "N"

    AND RD.DiskFreeKBMin < 1024

    Using the lookup tables to expand codes

    There are a number of tables with columns that hold enumerated code values, designed as three-

    character strings that are all uppercase ASCII and vaguely mnemonic (Figure 9). These can be

    expanded into a more readable form if required by using the MASTERREF table and the viewsbuilt from it.

    Figure 9. MASTERREF table; key columns and related views

    The MASTERREF table

    This contains a row for each distinct value of each enumerated type. The combination of the

    Enumeration and Code columns forms the primary key that can be used by the following views to

    look up a specific code value. The Name and Description columns then give you correspondinglylonger strings that can be used in reports instead of the code value itself.

    The xxxxREF Views

    Each view corresponds to a table and column that contains a code, as shown in Table 2.

    Table 2. Reference views and the columns they describe

    View name Table Column

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    18/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 18 of 22

    RUNMAJORSTATUSREF JOBRUN RunMajorStatus

    RUNMINORSTATUSREF JOBRUN RunMinorStatus

    RUNTYPEREF JOBRUN RunType

    JOBTYPEREF JOBEXEC JobType

    LOGTYPEREF JOBRUNLOG LogType

    STAGESTATUSREF JOBRUNSTAGE StageStatus

    LINKTYPEREF JOBLINK LinkType

    For example, if a row of the JOBRUN table contains "FIN"in the RunMajorStatus column, this can

    be looked up via the RUNMAJORSTATUSREF view as Name = "Finished" and Description = "Run

    has finished". A RunMinorStatus of "FWF"can be looked up via RUNMINORSTATUSREF as Name

    = "Finished aborted" and Description = "Run has finished and logged at least one fatal

    message".

    Listing 18. SQL Example:Querying the JOBRUN table using lookups againstreference views

    -- List the job names , job types, run types, and minor status names

    -- of all runs on host H that have finished.

    SELECT

    X.ProjectName, X.JobName,

    JT.JobTypeName, RT.RunTypeName,

    R.RunStartTimestamp, R.RunEndTimestamp,

    SR.MinorStatusName

    FROM

    DSODB.JOBRUN AS R

    JOIN DSODB.JOBEXEC AS X ON R.JOBID = X.JOBID

    JOIN DSODB.HOST AS H ON X.HOSTID = H.HOSTID

    JOIN DSODB.JOBTYPEREF AS JT ON X.JobType = JT.JobTypeCode

    JOIN DSODB.RUNTYPEREF AS RT ON R.RunType = RT.RunTypeCode JOIN DSODB.RUNMINORSTATUSREF AS SR ON R.RunMinorStatus = SR.MinorStatusCode

    WHERE

    H.HostName = "H"

    AND R.RunMajorStatus = "FIN"

    ORDER BY R.RunStartTimestamp

    Conclusion

    We have examined how the tables that make up the Operations Database schema contain

    rows that describe job runs, host systems, and system resource usage. We have described the

    identifying key fields for the main tables, some of the other important fields, and how foreign

    key fields are used to join other tables to them. We have given several examples of real-worldquestions that can be answered by appropriate use of these fields.

    Acknowledgements

    Thanks go to the members of the DataStage Operations Console development team who ensured

    that the schema got ported to the various flavors of database supported: DB2 LUW 9.5 and 9.7,

    Oracle 10g and 11gR2, Microsoft SQL Server 2005 and 2008, on the various Windows and

    UNIX platforms.

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    19/22

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    20/22

    developerWorks ibm.com/developerWorks/

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 20 of 22

    Resources

    Learn

    The operations database schema is documented on the IBM InfoCenter.

    IBM InfoSphere Information Server 8.7 What's New whitepaper, containing details of theOperations Console.

    IBM DB2 9.7 for Linux, UNIX, and Windows Information Centerand SQL reference.

    Learn more about Information Management at the developerWorks Information Management

    zone. Find technical documentation, how-to articles, education, downloads, product

    information, and more.

    Stay current with developerWorks technical events and webcasts.

    Follow developerWorks on Twitter.

    Get products and technologies

    Build your next development project with IBM trial software, available for download directly

    from developerWorks.

    Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2

    Express Edition for the community that offers the same core data features as DB2 Express

    Edition and provides a solid base to build and deploy applications.

    Discuss

    Check out the developerWorks blogsand get involved in the developerWorks community.

    http://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/downloads/im/udbexp/index.htmlhttp://www.ibm.com/developerworks/downloads/http://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/techbriefings/http://www.ibm.com/developerworks/data/http://www.ibm.com/developerworks/data/http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.sql.ref.doc%2Fdoc%2Fc0004100.htmlhttp://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsphttp://public.dhe.ibm.com/common/ssi/ecm/en/imw14590usen/IMW14590USEN.PDFhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v8r7/topic/com.ibm.swg.im.iis.ds.monitor.ref.doc/topics/opsdb.html
  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    21/22

    ibm.com/developerWorks/ developerWorks

    Get started with the IBM InfoSphere DataStage and QualityStageOperations Console Database, Part 1: An introduction Page 21 of 22

    About the authors

    Len Greenwood

    Len Greenwood was a member of the small development team that produced the first

    version of DataStage in 1996, prior to it being acquired from Ascential Software byIBM in 2005. It now forms a mainstay of the IBM InfoSphere Information Server suite.

    He has worked in the related areas of data and metadata integration for the past

    15 years and is currently the main product architect for the core components of the

    DataStage and QualityStage development and production tools. He recently designed

    the database schema that underlies the Information Server Operations Console, used

    to monitor activity at the DataStage engine level.

    Arron Harden

    Arron Harden is a senior software engineer for IBM InfoSphere DataStage and

    QualityStage. Staying with the DataStage product after several mergers and

    acquisitions, he has worked on DataStage for more than 12 years, joining IBM

    through the acquisition of Ascential Software Inc in 2005. Having spent a year

    working in Boston, he is currently based in the United Kingdom, working at the IBM

    Milton Keynes office. In his most recent role, he was the lead developer for the web

    application component of the DataStage and QualityStage Operations Console,

    written using the Dojo toolkit.

    Geoff McClean

    Geoff McClean was on the original DataStage development team at its inception and

    is currently a senior software developer for core components of the IBM InfoSphere

    DataStage and QualityStage development and production tools, part of the IBM

    InfoSphere Information Server suite. He oversaw the implementation of the database

    management, event handling, and resource tracking services of the IBM InfoSphere

    DataStage and QualityStage Operations Console.

    Sumit Kumar

    Sumit Kumar has 13 years of industry experience, including the financial, banking,

    telecom, supply-chain management, insurance, and healthcare domains. He has

    worked with IBM InfoSphere Information Server from early 2010, including the

    Operations Console feature of Information Server, from the beginning of its design

  • 8/10/2019 Dm 1205datastageopsdb1 PDF

    22/22


Recommended