Date post: | 14-Jan-2015 |
Category: |
Technology |
Upload: | data-blueprint |
View: | 956 times |
Download: | 0 times |
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Welcome!
Date: November 13, 2012Time: 2:00 PM ETPresenter: Dr. Peter Aiken
1
Get the Most out of Your Tools: Data Management Technologies
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Get Social With Us!
Live Twitter FeedJoin the conversation!
Follow us: @datablueprint
@paikenAsk questions and submit your comments: #dataed
2
Like Us on Facebookwww.facebook.com/
datablueprint Post questions and
commentsFind industry news, insightful
content and event updates.
Join the GroupData Management &
Business IntelligenceAsk questions, gain insights and collaborate with fellow
data management professionals
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
3
TITLE
PRODUCED BY DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION*
EDUCATION DATE SLIDE
11/13/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Meet Your Presenter: Dr. Peter Aiken
4
• Internationally recognized thought-leader in the data management field – 30 years of experience
• Recipient of multiple international awards
• Founder, Data Blueprint http://datablueprint.com
• 7 books and dozens of articles • Experienced w/ 500+ data management
practices in 20 countries • Multi-year immersions with organizations
as diverse as the US DoD, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia and Walmart
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION
Data Management Technologies
Data Management Technologies
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
5
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
The DAMA Guide to the Data Management Body of Knowledge
6
Data Management Functions
Published by DAMA International• The professional
association for Data Managers (40 chapters worldwide)
DMBoK organized around • Primary data
management functions focused around data delivery to the organization
• Organized around several environmental elements
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
The DAMA Guide to the Data Management Body of Knowledge
7
Environmental Elements
Amazon:http://www.amazon.com/DAMA-Guide-Management-Knowledge-DAMA-DMBOK/dp/0977140083Or enter the terms "dama dm bok" at the Amazon search engine
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
What is the CDMP?• Certified Data Management
Professional• DAMA International and ICCP• Membership in a distinct group made
up of your fellow professionals• Recognition for your specialized
knowledge in a choice of 17 specialty areas
• Series of 3 exams• For more information, please visit:
– http://www.dama.org/i4a/pages/index.cfm?pageid=3399
– http://iccp.org/certification/designations/cdmp
8
#dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Data Management
9
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Data Management
10
Manage data coherently.
Share data across boundaries.
Assign responsibilities for data.Engineer data delivery systems.
Maintain data availability.
Data Program Coordination
Organizational Data Integration
Data Stewardship Data Development
Data Support Operations
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Data Management
11
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
12
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Tools and Methods Are Required!
13
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Sample Existing Environment
14
RDBMS 1
FinanceHR
RDBMS 2
MarketingR
&D #
1
R&D
#2
R&D #3 NetworkDatabase
BackOfficeApplications
ManufacturingSystems Flat Files
LogisticsSystems Flat Files
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
15
As Is InformationRequirementsAssets
As Is Data Design Assets As Is Data Implementation Assets
Exi
stin
gN
ewReverse Engineering
To Be Data Implementation Assets
To BeDesign Assets
To Be Requirements Assets
Forward engineering
Reengineering is typically the problem solution…
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Example Query Outputs
Bibiana Duet's
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Data Management Technologies• Managing data technology should follow the
same principles and standards for managing any technology
• Leading reference model for technology management is the Information Technology Infrastructure Library (ITIL):
http://www.itil-officialsite.com/home/home.asp
17
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Understanding Data Technology RequirementsNeed to understand:• How the technology works• How it provides value in the context of a particular
business• Requirements of a data technology before determining
what technical solution to choose for a particular situation
Suggested questions:• What problem does this data technology mean to solve?• What sets this data technology apart from others?• Are there specific hardware/software/operating systems/
storage/network/connectivity requirements?• Does this technology include data security functionality?
18
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
19
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Defining Data Technology Architecture• Data technology is part of the overall technology
architecture• It is also often considered part of the enterprise’s
data architecture• Data technology architecture addresses 3
questions:
20
– What technologies are standard/required/preferred/acceptable?
– Which technologies apply to which purposes and circumstances?
– In a distributed environment, which technologies exist where, and how does data move from one node to another?
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Data Technology Architecture, cont’dData technologies to be included in the technology architecture:• Database management systems (DBMS) software• Related database management utilities• Data modeling and model management software• Business intelligence software for reporting and analysis• Extract-transform-load (ETL) and other data integration
tools• Data quality analysis and data cleansing tools• Metadata management software, including metadata
repositories
21
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Data Technology Architecture, cont’d
• The technology roadmap for the organization consists of technology objectives as well as reviewed, approved, and published technology architecture components
• This strategic roadmap can be used to inform and direct future data technology research and project work
22
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Polling Question #1
23
What is one important thing to understand about technology?
a) It is sometimes freeb) Buying the same technology
that everyone else is using, and using it in the same way will create business value
c) It should always be regarded as the means to an end, rather than the end itself
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Data Technology Architecture, cont’d• It is important to understand several things
about technology:– It is never free. Even open-sourced
technology requires care and feeding.– It should always be regarded as the means to
an end, rather than the end itself.– Most importantly: Buying the same technology
that everyone else is using, and using it in the same way, does not create business value or competitive advantage.
24
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
25
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Computer Aided Software/Systems Engineering Tools• Scientific application of a set of tools and methods
to a software system which is meant to result in high-quality, defect free, and maintainable software products
• Refers to methods for the development of information systems together with automated tools that can be used in the software development process
• CASE functions include analysis, design, and programming
26
Source: http://en.wikipedia.org/wiki/
Computer-aided software engineering (CASE) is the scientific application of a set of tools and methods to a software system which is meant to result in high-quality, defect-free, and maintainable software products. It also refers to methods for the development of information systems together with automated tools that can be used in the software development process.
CASE Tools
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
CASE Tools: Example(s)• Microsoft
– Visio
– Powerpoint
– Excel
• ERwin
• ER/Studio
27
List of CASE Tools: http://www.unl.csi.cuny.edu/faqs/software-enginering/tools.html
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
28
Figure 18.2 Sample budget for implementing a $2500/seat CASE technology can be $2.5 million over a 5-year period
[adapted from Huff "Elements of a Realistic CASE Tool Adoption Budget" © 1992 Communications of the ACM]
$187K =$2500/seat× 75 seats
$360K = training$500K = workstations$150K = assessment costs$910K = total initial investment
$150K = in-house support $ 55K = hardware and software maintenance $ 60K = ongoing training and misc. $265K = annual additional investment × 5 years $1325K investment over 5 years
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
CASE Tool: "Taxonomy"
29
[adapted from Joanes Assessment and Control of Software © 1994 Prentice-Hall]
• Senders—flows from the CASE effort that can inform the re-architecting effort.
• Receivers —flows from the project that can inform the CASE effort.
• Senders and receivers —some elements, such as restructuring and reengineering, are both senders and receivers.
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
CASE-based XML Support
30
http://www.visible.com
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
A variety of CASE-based methods and technologies can access and update the metadata
metadata
XMLIntegration
Additional metadata uses accessible via: web; portal;
XML; RDBMS
Everything must "fit" into one CASE technology
Changing Model of CASE Tool Usage
31
Limited access from outside the CASE technology environment
CASE tool-specific
methods and
technologies
Limited additionalmetadata use
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
32
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Repositories have been difficult to "sell"21 September 1999Michael Blechar, Lisa Wallace Management Summary Most executive and IS managers view an IT metadata repository as
an esoteric technology that is not directly related to the business. However, as will be seen, an IT metadata repository can substantially help IS organizations support the applications, which in turn support the business. An IT metadata repository is a pre-built system and reference database where the IS organizations can track and manage the information about the applications and databases they build and maintain; think of it as the inventory and change impact reporting system for IS. These repositories track metadata such as the descriptions of jobs, programs, modules, screens, data and databases, and the interrelationships between them. Metadata differs from the actual data being described. Metadata is information about data. For example, the metadata descriptions in the repository tell one that the field "customer number" appears in Databases A, B and F ...
33
[From gartner.com]
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
What tools do you use?
45%
23%
13%
9%7%
2% 1% 1% 1% 1%
None HomeGrown Other CA Platinum Rochade UniversalRepository
DesignBank DWGuide InfoManager InterfaceMetadata
Tool
• Almost one in four organizations (23%) is building their own repository technology
Repository Technologies in Use
Number Responding=181
• Almost one in two organizations (45%) doesn't use repository technology
• The "traditional" players (CA & Rochade) are in use in 16% of organizations surveyed
34
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Repository Evolution
35
Traditional
§ Passive Analysis
§ Relational & Data Warehouse
§ Batch & Reports
§ Optional not critical
§ Proprietary & OIM
Evolving§ Standards – investment
protection: MOF
§ Openness, Simplification & Choice: XMI
§ Diverse metadata management (including messaging)
§ Real time and ad hoc for decision support
§ Daily business value within a production architecture
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
"However, due to cost (these tools start at about $150,000, but frequently exceed $1 million) and being slow to market in terms of support for new service-oriented architectures (SOAs), CA and ASG have opened the door to smaller competitors"
36
Metadata Repositories 2004
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
IBM AD/CycleBusiness Goals ModelDefines the mission of the enterprise, its long-range goals, and the business policies and assumptions that affect its operations.Business Rules ModelRecords rules that govern the operation of the business and the Business Events that trigger execution of Business Processes.
Enterprise Structure ModelDefines the scope of the enterprise to be modeled. Assigns a name to the model that serves to qualify each component of the model.
Extension Support ModelProvides for tactical Information Model extensions to support special tool needs.
Info Usage ModelSpecifies which of the Entity-Relationship Model component instances are used by other Information Model components.
Global Text ModelSupports recording of extended descriptive text for many of the Information Model components.
DB2 ModelRefines the definition of a Relational Database design to a DB2-specific design.
IMS Structures ModelDefines the component structures and elements and the application program views of an IMS Database.
Flow ModelSpecifies which of the Entity Relationship Model component instances are passed between Process Model components.
Applications Structure ModelDefines the overall scope of an automated Business Application, the components of the application and how they fit together.
Data Structures ModelDefines the data structures and their elements used in an automated Business Application.
Application Build ModelDefines the tools, parameters and environment required to build an automated Business Application.
Derivations/Constraints ModelRecords the rules for deriving legal values for instances of Entity-Relationship Model components, and for controlling the use or existence of E-R instance.
Entity-Relationship ModelDefines the Business Entities, their properties (attributes) and the relationships they have with other Business Entities.
Organization/Location ModelRecords the organization structure and location definitions for use in describing the enterprise.
Process ModelDefines Business Processes, their sub processes and components.
Relational Database ModelDescribes the components of a Relational Database design in terms common to all SAA relational DBMSs.
Test ModelIdentifies the various file (test procedures, test cases, etc.) affiliated with an automated business Application for use in testing that application.
Library ModelRecords the existence of non-repository files and the role they play in defining and building an automated Business Application.
Panel/Screen ModelIdentifies the Panels and Screens and the fields they contain as elements used in an automated Business Application.
Program Elements ModelIdentifies the various pieces and elements of application program source that serve as input to the application build process.
Value Domain ModelDefines the data characteristics and allowed values for information items.
Strategy ModelRecords business strategies to resolve problems, address goals, and take advantage of business opportunities. It also records the actions and steps to be taken.Resource/Problem Model
Identifies the problems and needs of the enterprise, the projects designed to address those needs, and the resources required.
Process Model
Extension Support Model
Application Structure
Model
DB2 Model
Relational Database
Model
Global Text Model
Strategy Model
Derivations/ Constriants
Model
Application Build Model
Test Model Panel/ Screen Model
IMS Structure Model
Data Structure
Model
Program Elements
Model
Business ModelGoals
Organization/ LocationModel
Resource/ Problem
Model
Enterprise Structure
Model
Entity- Relationship
Model
Info Usage Model
Value Domain Model
Flow Model
Business Rules Model
LibraryModel
IBM's AD/Cycle Information Model
37
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Implementing Metadata Repository Functionality
• "The repository" does not have to be an integrated solution– it must be an easily integrateable solution
• Repository functionality (does not equal a) repository– metadata must easily evolve to repository solution
• Multiple repositories are not necessarily bad– as interim solutions, Excel has been working quite well
• Minimal functionality includes ability to create, read, update, delete, and evolve metadata items
• Remember the 1st law of data management– In order to manage metadata, you need metadata
repository functions38
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
39
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
40
Data Discovery Technologies
• Data analysis software technologies deliver up to 10X productivity over manual approaches
• Based on a powerful computing technology that allows data engineers to quickly form candidate hypotheses with respect to the existing data structures
• Hypotheses are then presented to the SMEs (both business and technical) who confirm, refine, or deny them
• Allows existing data structures to be inferred at rate that is an order of magnitude more effective than previous manual approaches
• Pioneers include Evoke->CSI, Metagenix->Ascential->IBM, Sypherlink
Profiling Discovery Analysis
- datablueprint.com 11/15/2012 © Copyright this and previous years by Data Blueprint - all rights reserved!
How has this been done in the past?
Old• Manually• Brute force• Repository
dependent• Quality
indifferent• Not repeatable
New• Semi-automated• Engineered• Repository
independent• Integrated quality• Repeatable• Currency• Accuracy
41
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
42
Select an Attribute toget a list of values
Double-click a value to see rows with that value
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
43
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Data Quality Engineering Tools4 categories of activities:
1) Analysis2) Cleansing3) Enhancement4) Monitoring
44
Principal tools:1) Data Profiling2) Parsing and
Standardization3) Data Transformation4) Identity Resolution and
Matching5) Enhancement6) Reporting
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
DQ Tools: (1) Data Profiling
• Need to be able to distinguish between good and bad data before making any improvements
• Data profiling is a set of algorithms for 2 purposes:– Statistical analysis and
assessment of the data quality values within a data set
– Exploring relationships that exist between value collections within and across data sets
45
DQ Tools: (2) Parsing &
Standardization• Data parsing tools enable
the definition of patterns that feed into a rules engine used to distinguish between valid and invalid data values
• Actions are triggered upon matching a specific pattern
• When an invalid pattern is recognized, the application may attempt to transform the invalid value into one that meets expectations
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
DQ Tools: (3) Data Transformation
• Upon identification of data errors, trigger data rules to transform the flawed data
• Perform standardization and guide rule-based transformations by mapping data values in their original formats and patterns into a target representation
• Parsed components of a pattern are subjected to rearrangement, corrections, or any changes as directed by the rules in the knowledge base
46
DQ Tools: (4) Identify Resolution
& Matching2 basic approaches to matching:• Deterministic
– Relies on defined patterns and rules for assigning weights and scores to determine similarity
– Predictable– Only as good as anticipations of the
rules developers• Probabilistic
– Relies on statistical techniques for assessing the probability that any pair of record represents the same entity
– Not reliant on rules– Probabilities can be refined based on
experience -> matchers can improve precision as more data is analyzed
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
DQ Tools: (5) Enhancement
Definition:• A method for adding value to
information by accumulating additional information about a base set of entities and then merging all the sets of information to provide a focused view
Examples of data enhancements:• Time/date stamps• Auditing information• Contextual information• Geographic information• Demographic information• Psychographic information
47
DQ Tools: (6) Reporting
Good reporting supports:• Inspection and monitoring of
conformance to data quality expectations
• Monitoring performance of data stewards conforming to data quality SLAs
• Workflow processing for data quality incidents
• Manual oversight of data cleansing and correction
Associate report results w/:• Data quality measurement• Metrics• Activity
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
48
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Traditional Quality Life Cycle
49
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
50
Data Life Cycle Model
MetadataCreation
DataAssessment
Metadata Refinement
Data Refinement
DataManipulation
Data Creation
DataUtilization
MetadataStructuring
Data Storage
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Startingpointfor newsystemdevelopment
data performance metadata
data architecture
dataarchitecture and
data models
shared data updated data
correcteddata
architecturerefinements
facts &meanings
Metadata &Data Storage
Starting pointfor existingsystems
Metadata Refinement• Correct Structural Defects• Update Implementation
Metadata Creation• Define Data Architecture• Define Data Model Structures
Metadata Structuring• Implement Data Model Views• Populate Data Model Views
Data Refinement• Correct Data Value Defects• Re-store Data Values
Data Manipulation• Manipulate Data• Updata Data
Data Utilization• Inspect Data• Present Data
Data Creation• Create Data• Verify Data Values
Data Assessment• Assess Data Values• Assess Metadata
Extended data life cycle model with metadata sources and uses
51
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
52
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Other Technologies Data Integration Definition:• Pulling together and reconciling dispersed data for
analytic purposes that organizations have maintained in multiple, heterogeneous systems. Data needs to be accessed and extracted, moved and loaded, validated and cleaned, standardized and transformed.
• Other tools include:– Servers
– EII technologies
– Portals
– Conversion tools
53
Source: http://www.information-management.com
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Polling Question #2
54
Which is not a strategic technology trend in 2013?
a) Hybrid IT and Cloud Computing
b) App and Cloud Computingc) Personal Cloud
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Top 10 Strategic Tech Trends in 20131. Mobile device Battles- By 2013 mobile phones will overtake
PCs as the most common Web access device worldwide.
2. Mobile Applications and HTML5- For the next few years, no single tool will be optimal for all types of mobile application so expect to employ several.
3. Personal Cloud- The personal cloud will gradually replace the PC as the location where individuals keep their personal content.
4. Enterprise APP Stores- Enterprises face a complex app store future as some vendors will limit their stores to specific devices and types of apps forcing the enterprise to deal with multiple stores.
5. The Internet of Things- The Internet of Things (IoT) is a concept that describes how the Internet will expand as physical items such as consumer devices and physical assets are connected to the Internet.
Source: http://www.gartner.com/it/page.jsp?id=2209615
55
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Top 10 Strategic Tech Trends in 2013
56
6. Hybrid IT and Cloud Computing- As staffs have been asked to do more with less, IT departments must play multiple roles in coordinating IT-related activities, and cloud computing is now pushing that change to another level.
7. Strategic Big Data- Big Data is moving from a focus on individual projects to an influence on enterprises’ strategic information architecture.
8. Actionable Analytics- Analytics is increasingly delivered to users at the point of action and in context.
9. In Memory Computing- In memory computing (IMC) can also provide transformational opportunities.
10.Integrated Ecosystems- The market is undergoing a shift to more integrated systems and ecosystems and away from loosely coupled heterogeneous approaches.
Source: http://www.gartner.com/it/page.jsp?id=2209615
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
XML Integration Server Requirements• Traditional Integration with Existing Systems
– Message Oriented Middleware– “EAI” Adapters
• Validation– Using XML Schema or DTD
• Query Multiple Integration Points using XQuery• Ease of Defining Mappings
– XML to Existing Systems– Existing Systems Creating XML
• APIs for XML
Adapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL
57
XML Server Types: Integration, Mediation, Repository
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
XML Mediation Server Requirements• XML Standards Based
– Ensures eXtensibility– Changing documents / applications– Transformation to new outputs
• Validation– Using XML Schema or DTD– Business Rules
• Integration with Existing Systems / Integration Servers
• Ease of Defining Rules via GUI for Business User– IT Should Not Have to be Involved
Adapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL
58
XML Server Types: Integration, Mediation, Repository
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
XML Server Types(Integration, Mediation, Repository)
XML Repository Server Requirements• XML Optimization
– Document Instance• XML Storage
– Stores Document in Native Format
• Better performance• Non-repudiation
– Compression
• XML Standards Support– Faster Development– Ensures Extensibility
• Support Data Access Security at Node levelAdapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL
59
XML Server Types: Integration, Mediation, Repository
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Portal Options
[Adapted from Terry Lanham Designing Innovative Enterprise Portals and Implementing Them Into Your Content Strategies Lockheed Martin’s Compelling Case Study Web Content II: Leveraging Best-of-Breed Content Strategies - San Francisco, CA 23 January 2001]
60
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
61
Top Tier Demo
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Portals as a Data Quality Tool
62
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Meta-Matrix Integration Example
63
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
64
ItemField
• Data extraction and conversion software solutions for transforming complex, unstructured data formats into XML for Enterprise Application Integration – RTF
– HTML
– HL7
– Positional (Offset-Based) reports
– TAB-delimited and other delimited reports
– EDI
• Binary documents are automatically converted to a suitable text for parsing for:– Microsoft Word documents
– Microsoft Excel documents
– PDF documents
– COBOL programs
Tamino
BizTalk
http://www.itemfield.com/
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
More Data Management Tools
65
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
More Data Management Tools
66
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
67
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
Questions?
68
It’s your turn! Use the chat feature or Twitter (#dataed) to submit
your questions to Peter now.
+ =
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
CLASSIFICATION
EDUCATIONDATE SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
December Webinar:Show Me the Money: The Business Value of Data and ROIDecember 11, 2012 @ 2:00 PM – 3:30 PM ET(11:00 AM-12:30 PM PT)
Sign up here:• www.datablueprint.com/webinar-schedule • www.Dataversity.net
Brought to you by:
Upcoming Events
69