Date post: | 18-Aug-2018 |
Category: |
Documents |
Upload: | dangkhuong |
View: | 215 times |
Download: | 0 times |
Informatica Intelligent Data Lake
[email protected] Octobre 2016 à 10h30
[email protected] Octobre 2016 à 16h
#1 In…
Data Security
Cloud DataManagement
Big DataManagement
DataIntegration
Master DataManagement Data Quality
Enterprise Imperative: Capitalize the value of Big D ata
Harness fast growing enterprise ‘natural’ resource – Big Data, generated by business applications, devices, sensors and users
Rapidly turn raw data into actionable insights to gain competitive advantage
New Trends in Data Management
Collect Everything : Raw/Refined, structured/unstructured, over extended period of time
New User Class: Analysts and scientists explore, reuse, refine, mash-up in self-service mode
Flexible Access : Batch, Real-time, Interactive, on-premise, cloud etc.
What are we hearing from customers?
“If you don’t understand the [data] assets [in the lake], how do
you use them?”
“Need visibility to the metadata for those data assets… provide visibility into data lineage”
“Too many data silos making it impossible to know what data can be
trusted”
“It's frustrating to see a really promising dataset only to find out
it's really bad.”
“Don't have a good way of sharing … assets.. connect the producer of data to consumer of
data.”
“Prepping & cleaning the data takes us 2-3 weeks ,
sometimes longer”
Data Intelligence Transforms Data Management
IT DrivenDATA PREPARATION
SiloedDATA USAGE
Self-ServiceDATA PREPARATION
CollaborativeDATA USAGE
UngovernedDATA ASSETS
GovernedDATA ASSETS
Find & Access Any Data Centrally
Discover Data Relationships
Prepare & Share Relevant Data
Operationalize BusinessInsights Quickly
Informatica Intelligent Data Lake
For Data Analysts and Data Scientists
• Enterprise data assets search and discovery
• Data acquisition from on-premise and cloud sources, batch and real-time
• Data set recommendations
• Excel-like Data preparation, enrichment for large data sets
• Data publishing and sharing
Intelligent Data Lake ission
Collaborative Self-service Big Data Preparation solution for
data analysts to rapidly discover and turn raw data into
insights with quality and governance
powered by data intelligence
Home PageRecent Projects, uploads and publications
Main
Navigation
Bar
My Recently
Opened Projects
INFORMATICA CONFIDENTIAL – DO NOT DISTRIBUTE
My Recently
Uploaded
Data Assets
My Recently
Published
Data Assets
Quick Project
Overview
Search and DiscoveryData discovery through a powerful search engine to find relevant data
INFORMATICA CONFIDENTIAL – DO NOT DISTRIBUTE
Data Asset OverviewOverview with asset attributes and integrated profi ling stats
INFORMATICA CONFIDENTIAL – DO NOT DISTRIBUTE
Asset attributes
collected from the
source system
Asset attributes
enriched by users to
add business context
Column profiling stats
including
Null/Unique/Duplicate
percentages, Inferred
data types and data
domains.
Details stats include
value and pattern
distributions
Add data asset
To Project from
any exploration
views
Data LineageInteractively trace data origin through summarized lineage views for analysts
A simplified view of lineage that
highlights the end points and not
the transformations in between
INFORMATICA CONFIDENTIAL – DO NOT DISTRIBUTE
Detailed Lineage
available on
Expanding the
lineage path
Data PreviewView sample records (based on user credentials) to get a sense of data
Right click on column
header to bring up
Column chooser
User can move
columns around in the
grid.
Top 500 records as sample only if
user has read authorization to the
table
Only available for
Tables and Views
Relationship ViewShows ecosystem of the asset in the enterprise base d on association to other assets
Get a 360 Degree View
of data asset using the
relationship view.
Includes related tables,
views, domains and
reports, users etc.
Ability to Zoom, find specific assets
in the view and filter by asset types
Expand relationship
circles to get more
details on relationship
types and objects.
Project DetailsOverview, Worksheets, Publication from this Project and Recommendations
Manage
Collaborators and
their privileges
Click on Prepare
to start Data
Preparation Worksheets get added
for each asset users
Adds to the Project
Recommendati
ons are shown
based on assets
added to the
project
Project Overview with Owner
information, Description etc.
Navigate to All
Projects
Project Details – Worksheets PanelPrepare and Publish Worksheets
Click on
Prepare to
start Data
Preparation
Worksheets get added
for each asset users
Adds to the Project
Publications
and their
status.
Worksheets created by
users during data
preparation stage by
combining,
aggregating, merging ,
copying etc.
Shows the status of
worksheets, which
ones are being
worked on. Some
may be in error state
if sampling failed.
Shows which sources were used to
create this sheet. Some are from
underlying Hive tables, some are
derived from other worksheets
Click on
Publish
button to
publish this
sheet. Only
Work In
Progress
sheets can be
published.
Data Asset RecommendationsAlternate and additional data asset recommendations based on other users’ actions
Assets that have been
used by others instead
of assets added in the
project.
Assets that have been
used by others in
addition to the assets
in the project.
Alternate and
Additional data asset
recommendations
Data PreparationExcel-based data preparation on Sample data
Data is sampled
and loaded into
worksheets
New sheets created by
user for combined
sheets, merged sheets
etc. as part of
preparation
Sheet level
summary
Sheet level
Suggestions
Data Preparation continued…Excel-based data preparation on Sample data
New formula
definition with
type-ahead
Large number of
functions
available for all
types of data
string, numeric,
date, statistical,
Math etc.
Advanced
functionality
such as Join,
Merge,
Aggregate,
Filter, Sort etc.
New values are
calculated and
shown right
away
Data Preparation continued…Excel-based data preparation on Sample data
Column
level
summary
Column value
distributions
Column level
Suggestions
Data
preparation
steps
captured as
“Recipe”
Data PublicationExecution of data preparation steps on actual data using Infa mapping
Publish the output of
data preparation steps
back to the lake
Recipe steps are
translated into
Informatica mapping
Informatica mapping is
handed over to BDM
platform for execution on
actual data sources
BDM platform uses either
Map/Reduce or Blaze or
Spark to execute the
mapping
Mapping is available to
the ETL specialists to
open in Informatica
Developer tool to
operationalize
Users credentials are used
to access the underlying
database.
My Activities Show My Downloads and Publications
Details of publication
activity
Shows all my uploads
and their status
All my
publication
activities and
status
File UploadWizard-based Delimited File Upload for Data Analyst
Data is uploaded to the
lake as a Hive Table and
registered in the data
catalog.
Standard file upload wizard to select
encoding, delimiters, column data types etc.
with data preview along the way
Ability to
Create/Append/
Overwrite tables
Users credentials are used
to access the underlying
database.
Retour d’expérienceInformatica Marketing Data LakeMathilde le Taillandier
EMEA Regional Marketing Director
25