ETL is No Longer King, Long Live SDD - WhamTech...Managing data in, or from, multiple disparate...

ETL is No Longer King, Long Live SDDHow to Close the Loop from Discovery to Information (Data) to Insights (Analytics) to Outcomes (Business Processes)

A presentation by Brian McCalley of DXC Technology, Glenn Field of SiriusIQ and Gavin Robertson of WhamTech, Inc.

Dirty

Typo/Transposition

Missing

Meaning

Duplication

Obfuscation

Governance

Location

System

Access

Security

Container

Format

Age

No secret that most organizations face major data-related hurdles

…and ANALYTICS is the prime driver to lower costs and increase revenue

…which, in turn, drives the need for applications* to have clean and understood data in specific formats

*Reporting, BI, analytics, CDI-MDM, CRM, SCM, fraud detection, anti-money laundering, ERP, etc.

Goals for an optimal data architecture

1. Complete, clean, transformed, standardized and secure data, and master data, for multiple applications

2. Near real-time – minimal update and query latency

3. Automation, including workflow and event processing

4. Support reporting, BI and analytics, including graph database

5. Minimize copies of data

6. Data discovery, metadata repository and data governance

7. Write back to data sources

Managing data in, or from, multiple disparate systems requires a new approach

CONVENTIONAL APPROACHES

• ETL: Copy and transform schemas and data to a one-size-fits-all data warehouse

• Copy: To a single Data Lake/Big Data repository

• Federate: Submit queries through adapters to source systems

• Search: E.g., Solr™/Elasticsearch™ - copy, read, parse and index data, process queries to provide data – Big Data options for storing data

CULTURAL

HURDLES

✓

SECURITY

& PRIVACY

HURDLES

TECHNICAL

HURDLES

The new approach leverages the advantages of each conventional approach

Typical Data Warehouse/Big Data/Data Lake/Search

Data

Ware-

house/

Big

Data/

Data

Lake/

Search

Data

SourceLoad

Data

Source

Data

Source

Application(s)

Data

and Schema

Transform

Extract

Load

Data

and Schema

Transform

Extract

Load

Data

and Schema

Transform

Extract

Queries resolved

in the Data

Warehouse/Big

Data/Data Lake/

Search

Expensive in terms of time

and cost to implement and

maintain

Typical federated data access with conventional adapters

Data

Source

Data

Source

Data

Source

Application(s)

AdapterConnector

MiddlewareAdapterConnector

AdapterConnector

Queries resolved

mainly in the data

source

Expensive to

implement and

limits capabilities

ETL has been the only option to come close to meeting the goals for an optimal data architecture, until now…

Introducing Software Defined Data (SDD) consisting of unconventional federated adapters that Read, Transform (process) and Index (RTI) source data and process queries against these indexes

SDD initial discovery, index and adapter configuration, index build and Standard Data View (SDV) mapping

Data

Source

Data Read,

Transform/

clean-up

(and Index)

Index schema

and names same

as data source

Twelve ways

to build and

maintain indexes

SDD

Adapter

w/SDVIndexes

Develop

and test

Data Transforms

using profiles

Network

Asset

and Device

Discovery

Metadata

Discovery

and Semantic

Mapping

Data

Source

Discovery

Indexes do not store

data – only queryable

representations

Data

Classification

and Data

Security

Data Discovery

and raw index-

based

Data Profiling Distributed Metadata Repository,incl. Data Governance

Indexes mapped

to Standard

Data View (SDV)

SDD index update, query processing and results retrieval

SDD

Federation

Server

(sub-

middleware)

w/SDV

Data

Source

Application(s)

Data Read,

Transform/

clean-up

(and Index)

Result-set pointers

to data in source

Results provided

in almost any format

Applications/middleware

connect with standard drivers, APIs,

Web/data services and SQL

SDD

Adapter

w/SDV

Multiple other data sources

Indexes

User-level

access

…

…

Middleware

Queries resolved

in the adapter

and indexes

Raw results data

transformed/cleaned-up

from source

Distributed Metadata Repository,incl. Data Governance

SDD

Federation

Server

…

…

…

Continuous EIQ Indexes updates

SDD adapters can co-exist with other types of adapters to System of Records

F I

R E

W A

L L

F I

R E

W A

L L

SDD

Federation

Server

SDD

Federation

Server

Social

Media

FeedIndexes SDD Adapter

SDD

Conventional

Adapter

3rd Party

AdapterSalesforce

Hadoop Indexes SDD Adapter

Mainframe Indexes SDD Adapter

ERP

System

SDD

Federation

ServerApplication(s)

ODBC/

JDBC Driver

REST API

Etc.

TCP / IP

RDBMS Indexes SDD Adapter

SDD indexes and adapters can be deployed and accessed anywhere, and at any level in multiple combinations

Indexes are 100% contiguous, regardless of where or how deployed

How does Software Defined Data compare with other approaches?

Goal #1: Complete, clean, transformed, standardized and secure data, and master data, for multiple applications

Goal SDD

ETL to a Data

WarehouseBig Data

Lake

ConventionalFederated Adapters

Solr/Elastic-search

#1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications

✓ ✓ ()

Goal #2: Near real-time – minimize update and query latency

Goal SDD

ETL to a Data

WarehouseBig Data

Lake


Solr/Elastic-search


✓ ✓ ()

#2 Near real-time – minimize update and query latency ✓ () (✓) (✓) (✓)

Goal #3: Automation, including workflow and event processing

Goal SDD

ETL to a Data

WarehouseBig Data

Lake


Solr/Elastic-search


✓ ✓ ()


#3 Automation, including workflow and event processing ✓ (✓)

Goal #4: Support reporting, BI and analytics, including graph database

Goal SDD

ETL to a Data

WarehouseBig Data

Lake


Solr/Elastic-search


✓ ✓ ()



#4 Support reporting, BI and analytics, including graph database

✓ (✓) (✓) (✓) (✓)

Goal #5: Minimize copies of data

Goal SDD

ETL to a Data

WarehouseBig Data

Lake


Solr/Elastic-search


✓ ✓ ()




✓ (✓) (✓) (✓) (✓)

#5 Minimize copies of data ✓ ✓

Goal #6: Data discovery, metadata repository and data governance

Goal SDD

ETL to a Data

WarehouseBig Data

Lake


Solr/Elastic-search


✓ ✓ ()




✓ (✓) (✓) (✓) (✓)


#6 Data discovery, metadata repository and data governance

✓ ✓ () (✓) ()

Goal #7: Write back to data sources

Goal SDD

ETL to a Data

WarehouseBig Data

Lake


Solr/Elastic-search


✓ ✓ ()




✓ (✓) (✓) (✓) (✓)



✓ ✓ () (✓) ()

#7 Write back to data sources ✓ ✓

How SDD meets goals for an optimal data architecture

Goal Software Defined Data


• Process source data as building and maintaining indexes and master data, and as reading raw results data

• Multiple indexes, views, means of access and result formats

#2 Near real-time – minimize update and query latency • Changed data capture• High performance, parallel distributed processing – almost no

load on data sources

#3 Automation, including workflow and event processing • Index monitoring, REST APIs and workflow integration


• Indexed views, provision highly curated data to analytics, run analytics, and built-in virtual graph database and link analysis

#5 Minimize copies of data • Can leave and secure data in sources, a Data Lake or indexes


• Use raw indexes for discovery, metadata and combining with IAM and RBAC for data governance – from edge/bottom up

#7 Write back to data sources • Can read as well as insert, delete and update

Software Defined Data (SDD)

• Implementation and alignment of use-cases is the key to driving Enterprise IP. Technology prohibits this due to binding of data elements within applications

• Freeing data to create workflows will dramatically reduce time to market

• Incrementally developing enterprise use-cases through SDD drives innovation to next-gen path allowing a reinvention of the enterprise

• Agnostic de-coupling of silo solutions drives speed to market

• Use-case consumption for any user, on any device, anywhere securely enhances collaboration and productivity

Software Defined Data (SDD)

• Process to eliminate upwards of 95% of today’s regression issues –translates into 50-75% calendar time savings

• Business logic and code base functions grow incrementally as business dictates

• Cumulative code growth ensures reuse, optimal performance and agnostic access. You also benefit from globally available logic

• Zero impact deployments eliminates downtime and simplifies the SDLC

• Dynamic, intelligent workflows consume new features when live

AI and NLP are the new UI for many applications

• Leveraging next-gen cognitive rapidly delivers functionality and results

• Millennial workforce alignment

• Dramatic decrease in training requirements

• Dramatic decrease in time to market on features and results

• Disconnect 3rd party backend solutions from natural language UI

• Allow seamless app upgrades by using AI UI, which will interact with both old and new systems

Abstracted 3-tier architecture connected through a Smart Data Fabric

SDD Data Fabric – API Catalog

Line-of-Business UsersEcosystem Solutions

SYSTEM OF RECORDS – INTERNAL / EXTERNAL DATA SOURCES

SYSTEM OF INSIGHTS – ANALYTICS WAREHOUSE

SYSTEM OF ENGAGEMENT - APPLICATIONS

PLATFORM MANAGEMENT SERVICES

SDD is the First Paradigm Shift in how data and analytics are managed in a common meta-object framework

*Indexes per data source on structured, unstructured and semi-structured data that can either store or not store (default) data

Standard drivers, APIs, Web/data services, SQL , Spark SQL and other query languages

Identity Management Application and Other Corporate SecurityOther Corporate GovernanceAccess Control

Discovery Security Quality Transformation Standardization RelationshipsMaster DataManagement

Virtual GraphDatabase

INDEXES*

Metadata Repository Data Governance Role-basedAccess Control

Audit LogsAPI Management Business Rules

Distributed Data Management, Including Data Source Results Read and Write Back

Optional Data Lake – Real-time, Incremental or Batch Updated, Centralized or Distributed

Profiling Analytics

BPM Software

SYSTEM OF RECORDS – INTERNAL / EXTERNAL DATA SOURCES

SYSTEM OF INSIGHTS – ANALYTICS WAREHOUSE

SYSTEM OF ENGAGEMENT - APPLICATIONS

The Second Paradigm Shift is the concept of the Analytics Warehouse

The Second Paradigm Shift is the concept of the Analytics Warehouse

An example of an Analytics Warehouse architecture –data ingestion/warehouse model

An example of an Analytics Warehouse architecture spanning enterprise systems – federated model

Examples of applying SDD and an Analytics Warehouse to healthcare analytics

Healthcare Clinical Network Management 3-Tier Architecture

Enterprise/ Ecosystem-wide Master Data Management Platform

Longitudinal Patient Record

Analytical Insights Catalog and Ecosystem

Clinical, Financial, Administrative, Operational

End-End Integrated Application Ecosystem Environment

across Continuum of Care

Development and IT Operations Support Environment

Human Capital Management, Finance, Sales & Marketing

SDD

Dat

a Fa

bri

c

He

alth

care

Clo

ud

3rd

Par

ty P

artn

er E

cosy

stem

End-to-End Security

AI Driven Operations Automation

SoR

SoI

SoE

SDD Data Fabric enables a Longitudinal Patient Record (LPR) view across multiple System of Records, across multiple enterprises• Transparent distributed data management layer that plugs-and-plays in existing IT infrastructures• Complements and leverages existing IT systems, tools and applications• Leave and guard data in sources, copies, e.g., Data Lake, or stored in indexes – a hybrid approach• Address upfront data discovery, security, quality, standards, MDM and other data-related

processes

Use cases from healthcare that combine data and analytics management

Use Cases Applications

Clinical Applications ✓ Diabetes, Hypertension, Heart Failure, etc.

✓ Gaps in Care

✓ Predictive Readmissions Management

✓ Clinical Wellness Management

Operational Management ✓ Operational Management – Hospital

✓ Operational Management – Physician Practices

✓ Physician Quality Reporting Scores

Financial Performance ✓ Financial Management – Hospital

✓ Financial Management – Physician Practices

✓ Claims Analytics

Regulatory Reporting ✓ Hospital Value Based Purchasing (HVBP)

✓ HEDIS

✓ Patient Centered Medical Home Scorecard

✓ MU 2 Clinical Quality Measures-Hospitals/Physicians

✓ MU2 Usage Scorecard - Physicians

✓ ACO Quality Reporting

✓ Hospital Outpatient Quality Reporting

Ability to create patient cohorts ✓ Cohort Manager/Chronic Condition Management

Population Management ✓ Population Focus & Population Care

• Patient Similarity

• Comparative Effectiveness Research

• Predictive models –Chronic disease management

Conclusion of why ETL is no longer King, long live SDD• SDD enables a data management paradigm shift

• SDD supports an analytics management paradigm shift

• ETL still has value, but not exclusively

• SDD can greatly enhance, complement and/or replace ETL in the future

• SDD is more suited than ETL to the new world of:• Data everywhere

• API data services/catalog

• Parallel distributed processing

• Event and workflow processing

• Near real-time architectures

Thank you

A presentation by Brian McCalley of DXC Technology, Glenn Field of SiriusIQ and Gavin Robertson of WhamTech, Inc.

Q&A

Date post:	20-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

ETL is No Longer King, Long Live SDD - WhamTech...Managing data in, or from, multiple disparate...

Documents