Date post: | 25-Feb-2018 |
Category: |
Documents |
Upload: | christopher-williams |
View: | 242 times |
Download: | 0 times |
of 23
7/25/2019 04 Metadata and Metadata Management
1/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Module 4 Metadata and Metadata Management
etadata is a term that means !information about data!" DataStage reliesheavil# upon metadata to describe the data that are to be processed, the
format of the data, the processing that is re$uired, and so on" %dditionalmetadata can be used to ans&er end users' $uestions such as !&here didthis value come from!
nformation Server has a unified metadata la#er through &hich metadata
can be shared among man# products, including DataStage" n addition
each DataStage project has its o&n, local repositor# for metadata" %ll ofthis metadata is available to DataStage users, and therefore must be
managed rigorousl#"
Objectives
*aving completed this module #ou &ill be able+
to list three classes of metadata
to import DataStage components from a given DataStage eport
file
to inspect metadata in the -epositor# using Designer
to use .uic/ Find and %dvanced Find in the -epositor#
to define !nullable!
to eport DataStage components from the -epositor#
age 1
7/25/2019 04 Metadata and Metadata Management
2/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Metadata
3he &ord metadata comes from a 4ree/ prefi !meta!, meaning above,and !data!" !Data! is the plural past participle of the 5atin verb !dare!,
meaning !to give!" 3he singular past participle is !datum!, something(that has been) given" So !data! literall# means things (that have been)given"
n information technolog# (3), the &ord !metadata! is usuall# ta/en to
mean !information that describes data!, and ever#one claims to understand
&hat !data! means in 3"
etadata allo& $uestions about the data to be ans&ered" For eample, anend user ma# be loo/ing at a pie chart in &hich one sector contains 26
of the overall total" 3he user ma# be interested to /no& ho& up to date the
data are, &hether the# are complete, &hat relationship the# bear to theoperational s#stems' data, and &hat processing the# under&ent bet&een
there and the pie chart"
3here are several classes of metadata" %uthorities differ on ho& man#"
For a DataStage developer the three most important are listed here"
Business metadataincorporates all /no&ledge about the data that
the business has (or ought to have)" 3his might include business
rules (for eample !a customer number has the follo&ing format!,
!metric measures of distance are converted to 7S measures in theD!, !order date must be no later than current date during data
entr#!, and so on), and o&nership and9or responsibilit# (for
eample !the product price table is o&ned b#, and /ept up to date
b#, the sales management group!)" .uite often, business metadataare produced b# people &ith titles such as business anal#st or
metadata ste&ard"
Technical metadataare those that describe the technical aspects
of data, such as the format (particularl# of tet files), the ro&s and
columns, S.5 data t#pes, and so on" 3echnical metadata alsodescribe the processing that occurs to the data, not onl# during
:35 but also during original data entr# and an# reformatting that
; tools might perform" 3hese often become specifications &ith&hich programmers9developers &or/"
Process metadataare no less important" 3hese record &hat
processing actuall# too/ place, &hether all records &ere processed
or some &ere rejected"
7/25/2019 04 Metadata and Metadata Management
3/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Business Metadata in DataStage
;usiness metadata are t#picall# maintained outside of DataStage, perhaps
b# a business anal#st using the ;usiness 4lossar# product (another
product in the nformation Server suite)" DataStage does not directl# usebusiness metadata, but having it available can assist developers in, for
eample, assigning correct validation logic" 3he usual place &here
business metadata is to be found in the DataStage repositor# is in Data:lements" ;usiness metadata can also be found in annotations in job
designs and in description fields on jobs, stages and lin/s"
Figure 4-1 Example of Data Element
Figure 1sho&s an eample of a Data :lement" 3he S.5 tab allo&s the
most li/el# data t#pe for this element to be recorded, though it is not
enforced" 3he other t&o tabs define the data element=s relationships &ithDataStage 3ransforms, &hich are available onl# in server jobs, not in
parallel jobs (this class is about parallel jobs, so 3ransforms &ill not be
discussed)"
Data elements can be added to an# table definition, to highlight that aparticular field has business metadata to be carried &ith it" 7sage anal#ses
can be performed on data elements, for eample to ans&er developer$uestions such as >&hich jobs process revenue? (assuming there is a dataelement called -evenue or something similar)"
age @
7/25/2019 04 Metadata and Metadata Management
4/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Technical Metadata in DataStage
3here are four main areas into &hich technical metadata for parallel jobs
ma# be grouped" 3hese are configurations, table definitions, source code
(primaril# of routines) and :35 job designs themselves"
Configurationscover a &ide range of things, most of &hich &e discuss inother modules or in the %dministrator class"
3he obvious one is the parallel eecution configuration files" :ach of
these provides a list of hosts and resources (nodes) on &hich parallel
eecution can ta/e place" Different configuration files can be used fordifferent tas/sA for eample a one&a# configuration is best suited for
processing a single ro&, a one&a# or t&o&a# configuration is suited to a
small volume of data, &hereas a @B&a# configuration could process aver# large volume of data indeed"
7/25/2019 04 Metadata and Metadata Management
5/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
outinesare of three /inds" arallel routines, &hich can be called from
the 3ransformer stage in parallel jobs, are created outside of DataStage in
the CEE language and compiled and lin/ed" %n entr# is placed into therepositor# to record the name, arguments and location of the librar# or
object containing the routine"
Server routines are created &ithin DataStage, using DataStage ;%SC asthe programming language" 3he# are stored directl# in the repositor# andare of t&o /inds"
;efore9after subroutines can be used in parallel jobs, in server jobs
and in active stages in server jobs"
3ransform functions can be used in server 3ransformer stage, in
the ;%SC 3ransformer stage in parallel jobs, and in -outineactivities in job se$uences" (3here is a full course called
Programming with DataStage BASICavailable" e do not have
time to cover creation of routines in this class")
!ob designsare created using DataStage designer and are stored directl#in the repositor#"
Process Metadata in DataStage
:ach time a job runs, it /eeps a log of its activit# and periodicall# updates
status information such as C7 usage and ro& counts" 3his information is
stored in the -epositor#, and ma# be vie&ed using the Director client andreported on using the reporting console of nformation Server"
:nvironment variable options allo& the collection of etra informationabout processingA most of these are in the -eporting folder"
Figure 4-" En#ironment $ariables That Control eporting
age
7/25/2019 04 Metadata and Metadata Management
6/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
age B
7/25/2019 04 Metadata and Metadata Management
7/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Metadata Repository
etadata need to be stored some&here so that the# can be used" ForDataStage, metadata are stored in the >metadata repositor#?"
n fact there are t&o metadata repositories" :ach DataStage project has alocal repositor# and there is a central, unified metadata repositor# for all
nformation Server products" >7nified? in this contet means that themetadata are stored in a format such that the# are accessible G via the
metadata deliver# and anal#sis services G b# an# nformation Server
product"
hen using DataStage, #ou are not a&are, in general, in &hich of the t&orepositories #our particular metadata resides" etadata ma# be in one, the
other, or both" Hou access the metadata repositor# through the -epositor#
toolbar in the Designer client"
Figure 4-% epositor& Toolbar
Figure @sho&s the -epositor# toolbar in a project called (as indicated
b# the tab) Demonstrations" Chances are that #our -epositor# &ill have adifferent set of folders, since the structure is completel# customiIable"
Some of the folders in Figure @ relate to .ualit#Stage jobs, some relate
to mainframe jobs" f #ou do not have these capabilities installed on #our
DataStage server, then #ou ma# not see these folders"
age J
7/25/2019 04 Metadata and Metadata Management
8/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
n the Designer client, the repositor# is organiIed as a tree, in &hich #ou
create as man# branches as needed" ;e careful, though, not to create so
comple a structure that it becomes impossible to maintain"
Figure 4-4 epositor& 'ith (ne Branch Expanded
n Figure the -outines branch in the -epositor# has been epanded"read onl#?"
3o ma/e a cop#, select the object=s name, right clic/ and choose >Create a
cop#? from the menu" For eample, if the object is called KHL, then the
ne& object &ill be called Cop#
7/25/2019 04 Metadata and Metadata Management
9/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
-ename option available, or #ou can rename most objects &ithin their
editing dialog"
Creating New Categories
3o create a ne& folder an#&here in the repositor#, right clic/ on the folder&hich &ill be the parent of the ne& folder" (3here is a >project? folder at
the ver# top of the tree that can serve as the parent of ne& toplevel
folders")
Choose e& from the popup menu, then Folder from the subse$uentl#displa#ed menu" n Figure this process is illustrated creating a ne&
subfolder in the arameterSets branch of the -epositor#"
Figure 4-) Creating a *e' Folder
age M
7/25/2019 04 Metadata and Metadata Management
10/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
3his &ill open a revised -epositor# toolbar &ith the ne&l#created folder
named e&Folder, selected (highlighted) &aiting for its name to be
changed"
3his is sho&n in Figure B" Hou should, of course, rename the ne& folder
immediatel# to something more meaningful"
Figure 4-+ *e'l& Created Folder
Deleting a olderhen #ou select a folder in the repositor# and press the Del button on
#our /e#board, or rightclic/ the folder and choose Delete from the menu,
#ou might be deleting not just the folder but also its entire contents"
3o help guard against the possibilit# of accidental deletion, a confirmationdialog appears as/ing #ou to confirm deletion of the selected items"
n the case of deleting a single folder this dialog &ill have onl# the named
folder in its list" *o&ever, deletion can be initiated from the result of a
search of the -epositor#, so that the confirmation dialog allo&s #ou tolimit the items to be deleted to just those &hich #ou select in the dialog
itself"
Searching the !e"ositor# $uic% ind
3here are t&o tools for searching the repositor# G .uic/ Find and
%dvanced find" 5et=s loo/ at .uic/ Find first"
age 10
7/25/2019 04 Metadata and Metadata Management
11/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
henever #ou are &or/ing in the -epositor# there is almost al&a#s a lin/
to >
7/25/2019 04 Metadata and Metadata Management
12/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Figure 4-/ uic. Find Dialog 0T&pes to find 2ist 3Partial
3he >nclude description? chec/ bo allo&s the search to include
searching for the indicated string or &ildcard pattern in the descriptionfields of DataStage objects"
3he initial result of .uic/ Find is an epanded repositor# tree &ith the
first object in &hich the search &as successful highlighted" et and rev
buttons allo& this tree vie& to be navigated"
Figure 4-5 uic. Find 6nitial esult
n Figure Mthe result of an unconstrained search for >onth? are sho&n"1B >hits? &ere obtained, the first being in the Date4eneric3o3imestamp
routine in N-outinesNsd/NDate folder"
age 12
7/25/2019 04 Metadata and Metadata Management
13/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Clic/ing on the >1B matches? lin/ or on the %dv button &ould open the
%dvanced find capabilit#" %lternativel# #ou can rightclic/ on an# of the
selected objects G or, indeed, an# of the objects G and perform otheractivities such as rename, eport, or >&here used? or >dependencies?
anal#ses"
Searching the !e"ositor# &dvanced ind
3he %dvanced Find dialog offers the same search capabilities as .uic/Find, but &ith a greater range of filters available"
Figure 4-17 8d#anced Find
Figure 10sho&s the same search as &as illustrated for .uic/ Find,
namel# for the &ord >onth? occurring in the object name or description"
n %dvanced Find, ho&ever, #ou can specif# different &ords in thedescription &hile still filtering on the object name"
7/25/2019 04 Metadata and Metadata Management
14/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
the case" 3his cue remains visible even though that particular part of the
filter has been minimiIed"
Figure 4-11 8d#anced Find 0Created Filter Dialog
:here usedallo&s #ou to set up a list of repositor# objects so that the
search finds onl# objects that use the objects in #our list"
Dependencies ofallo&s #ou to set up a list of repositor# objects so thatthe search finds onl# objects that are dependencies of an# of the objects in
#our list" For eample, a job can be a dependenc# of a job se$uence, a
routine can be a dependenc# of a job or even of another routine"
T&pe specificallo&s #ou to set a table definition that &ill be used to findthose table definitions in the repositor# that are related via the same shared
3able" n this contet, a >shared 3able? is a table definition in thecommon, unified metadata -epositor# for nformation Server"
3here are four(ptions" Search can be case sensitive or not, can be &ithinthe last result set onl# (or not), can include nested results for dependenc#
searches, and can search for a match in object name or description or both"
age 1
7/25/2019 04 Metadata and Metadata Management
15/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Table Definitions
DataStage uses the term >table definition? to mean an# form of record
la#out definition" 3he term has its origin in database terminolog# but hasbeen etended, for DataStage use, to mean record la#out metadata froman# source"
So, for eample, DataStage records the format of a se$uential file as its
>table definition?" DataStage records the format of a C
7/25/2019 04 Metadata and Metadata Management
16/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Figure 4-1" Table Definition 2a&out Tab
Exporting DataStage Components
DataStage Components (that is, an# object in the repositor#) can be
eported into a tet file" 3&o formats are available"
% DSK (DataStage eport) file is the original format used b#
DataStage" t is the more compact of the t&o formats, a factor thatmight be considered if, for eample, contemplating emailing the
eport file"
3he other format uses K5 (etensible mar/up language) &hich
identifies each component &ith its o&n pair of tags as &ell as using
tags and a st#le sheet to represent the relationships bet&eencomponents"
:porting DataStage components is accomplished via the :port menu in
Designer, or b# choosing :port from the results of a .uic/ Find or an%dvanced Find" 3he -epositor# :port dialog allo&s #ou to specif# &hatto eport, and &here"
age 1B
7/25/2019 04 Metadata and Metadata Management
17/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Figure 4-1% Data;tage epositor& Export Dialog
3he tems to :port pane contains the list of items to be eported" 3he
%dd lin/ reinvo/es .uic/ Find to locate more items" :ventuall# #ouhave a list of items in this field, some or all of &hich #ou have selected to
be eported" n the status bar at the bottom of the &indo& is reported ho&
man# objects have been selected and ho& man# of these &ill be ignored
(not eported)" For eample, if an# readonl# items have been selectedand >:clude readonl# items? is set, then these readonl# items &ill be
ignored"
3he eport file is al&a#s on the client machine1" 3he t#pe of eport field
governs the format of the eport file and also its filename suffiA DSKfiles have >"ds? as their suffi, &hile K5 eport files have >ml? as
their suffi"
f >append to eisting file? is not selected and the eport file alread#
eists, an >
7/25/2019 04 Metadata and Metadata Management
18/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Importing DataStage Components
7sing DataStage Designer=s mport menu #ou can import DataStagecomponents that have been eported from an# DataStage project@"
7nder the mport menu the first t&o options are >DataStage components?for importing from a DSKformat file, and >DataStage components
(K5)? for importing from an K5format file" 3he DataStage-epositor# mport dialog is relativel# simple"
Figure 4-14 Data;tage epositor& 6mport Dialog
7/25/2019 04 Metadata and Metadata Management
19/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
'm"orting Table De(initions
%s noted earlier, a >table definition? in DataStage describes the record
la#out in an# data sourceA it does not have to be a database table" 3abledefinitions can be imported into the repositor# from a number of sources
as illustrated in Figure 1"
Figure 4-1) Data;tage Table Definition 6mport 9enu
n later modules &e &ill investigate a couple of these in some&hat more
detail" %s a general principle, ho&ever, each opens a &iIard that ta/es
#ou through identif#ing the metadata source, retrieving the definitionsfrom that source and storing them in a particular categor# in the
repositor#"
3he Connector import &iIard allo&s table definitions to be imported into
the DataStage repositor# from the unified nformation Server repositor#"Se$uential File definitions are unusual in that #ou also have to specif#format information, as &ell as importing9defining column definitions"
Customaril# table definitions are stored in the 3able Definitions branch of
the repositor# &ith t&o levels of categor#, data source t#pe and data
source name" For eample a table definition imported from an
7/25/2019 04 Metadata and Metadata Management
20/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
categor# N3able DefinitionsNnullable? meaning that the# ma# contain 755"
n the contet of database tables, 755 indicates that there is no /no&nvalue for this field in the current ro&" *o& 755 is stored is different in
different databases, and immaterial"
;ecause 755 is un/no&n, there are ver# fe& operations that can be
performed &ith it" For eample, adding @ to an un/no&n value #ields astillun/no&n value"
Functions in parallel 3ransformer stages are particularl# intolerant of
755 G #ou need to handle 755 specificall#"
3&o tests ma# be performed &ith nullable fields G #ou can as/ &hether
the value S 755 or &hether the value S outer? source &ill
return 755 if there is no match on the join condition"
ithin a DataStage table definition an# field can be mar/ed as ullable or
not" *o&ever, if there is an# possibilit# that this field ma# contain 755
then it must be mar/ed ullable"
3et files have no data t#pes, and therefore no implicit concept of 755"ith se$uential files, therefore, it is necessar# to specif# some tet string
&hich, if encountered, &ill be understood to represent 755" 3his is
covered in more detail in the module on Se$uential Files"
DataStage=s internal representation of 755 is usuall# a single b#te&hose binar# value is 10000000" *o&ever, in environments &here this
b#te is used to represent the :uro currenc# s#mbol, a different b#te value
can be configured for DataStage to use" DataStage=s internal 755 is
referred to as an >outofband null?"
3his value can be represented as an int8 field &hose value is 128"
age 20
7/25/2019 04 Metadata and Metadata Management
21/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
%n >inband null? is a special value, legal for the data t#pe, that is used to
represent 755 even though it is not" For eample, an inband null for
Date*ired (data t#pe Date) might be 18000101 G a legal date butimpossible in data as a date hired" 3herefore an# representation of 755
in a Se$uential File stage is, effectivel#, an >inband null?"
Conversion functions eist for s&itching bet&een outofband and inbandnull, and for generating null" 3hese are different in the 3ransformer stageand the odif# stage"
Table 4-1 *ull =andling Functions
Description 9odif& ;tage Transformer ;tage
3est for null null() sull()
3est for not null notnull() sotull()
Convert null to value handlePnull() ull3oRalue()ull3o:mpt#()
ull3oLero()
Convert to inband null handlePnull() ull3oRalue()ull3o:mpt#()
ull3oLero()
Convert to outofband null ma/ePnull() Setull()
age 21
7/25/2019 04 Metadata and Metadata Management
22/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010
Review
3he term >metadata? is usuall# understood to mean information about data
or the processing of the data" ;usiness metadata incorporates /no&ledgethat the business has about the data, such as business rules, o&nership andresponsibilit#" 3echnical metadata includes things li/e table definitions,
routine code and the li/e" rocess metadata describes &hat happened to
the data, &hen, and &ith &hat result9success" DataStage stores metadata
in both the central nformation Server repositor# and in its o&n localrepositor#"
3he -epositor# toolbar in DataStage (Figure @) does not reveal in &hich
location an# particular item of metadata is stored" t is organiIed into
folders, over &hich #ou have complete control" ;ut it is &ise to follo&some s#stematic &a# of storing metadata" 3he terms >categor#? and
>pathname? are both used to describe the location of a particular folder, orcomponent in a folder, in the -epositor#"
DataStage has t&o search utilities, .uic/ Find and %dvanced Find" 3helatter has more filters, and allo&s a greater range of things to be done &ith
the results of the search"
table definition?, a
term that encapsulates an# collection of column definitions" 3hese can beimported using a number of different tools" t is also possible to eport
an# combination of components from the -epositor# into a file that can be
subse$uentl# used to import some or all of these components into another
DataStage project"755 is a concept, that of a data item &hose value is un/no&n" :ver#
database has its o&n &a# of representing 755 internall#, as does the
DataStage server" Functions eist to test &hether a data item is null (or isnot null), to substitute a value &here this is true, and to generate outof
band null &here needed" Some activities, such as outer joins, can also
return 755"
Further Reading
Parallel ob Developer!s "uideChapter 2
Designer Client "uideChapter 2 and 1@
age 22
7/25/2019 04 Metadata and Metadata Management
23/23
DataStage Fundamentals (version 8 parallel jobs) Staffordshire, December 2010