Post on 08-Feb-2017
transcript
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Analytics Data LabThe power of Big Data Investigation and Advanced Analytics to maximize the Data Capital
Roberto Falcinelli Senior Manager - Sales Consulting & Business DevelopmentOracle Business Analytics
24 Gennaio 2017Webinar per Fondazione CRUI
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 2
AbstractAnalytics Data Lab
The power of Big Data Investigation and Advanced Analytics to maximize the Data Capital
I dati sono il nuovo Capitale: come il capitale finanziario, sono una risorsa che deve essere gestita, raccolta e tenuta al sicuro, ma deve essere anche investita dalle organizzazioni che vogliono ottenere vantaggio competitivo. I dati non sono una risorsa nuova, ma soltanto oggi per la prima volta sono disponbili in abbondanza assieme alle tecnologie necessarie per massimizzarne il ritorno. Esattamente come l'elettricità fu una curiosità da laboratorio per molto tempo, finchè non venne resa disponibile alle masse e dunque cambiò totalmente il volto dell'industria moderna.Ecco perchè per accelerare il cambiamento è necessario un approccio innovativo alla esecuzione delle iniziative orientate ai Big Data: un laboratorio analitico come catalizzatore dell'innovazione (Data Lab).Vieni a scoprire durante questo webinar, attraverso il racconto di casi d’uso ed esperienze concrete dei suoi clienti, come Oracle mette a disposizione le tecnologie e le soluzioni che le hanno rese vincenti.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
1
2
3
Data Capital: Big Data & Analytics Market Trends
Data Lab Catalyst of Innovation
Oracle Big Data Analytics capabilities enabling the Data Lab
A Data Lab Demo Story Example
Use Cases & References
Oracle Confidential – Internal/Restricted/Highly Restricted 3
4
5
The Rise Of Data Capital
1. Data is now a kind of capital
2. Companies & organizations must execute new strategies to compete
3. Data needs to be secured and invested like the economic capital
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Analytics 2.0
Era of Competing on Analytics• After Big Data (ABD)• Started with Web Companies• Extend analytics to external, larger
and less structured datasets• New technologies for new challenges• Recognition of Data Science
Tom Davenport – Analytics 3.0 – HBR - Dec 2013
A new shift for Analytics
Analytics 1.0
Era of Business Intelligence• Before Big Data (BBD)• Batch oriented internal data
collection & preparation• Batch oriented Analysis/Reporting
Focus on Improve performance DATA as a CAPITAL to invest and gain competitive advantage
Era of Data Enriched offerings• Embed Analytics in New Products /
Services• Affordable for every industry• New and old data management technology
combined• Faster “test-do-learn” cycle• Analysts focus on Data Discovery• Data Lab & Data Factory : turn exploratory
analysis into production capabilities
Analytics 3.0
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
A change of paradigm in Information Management
6
Build a Data Reservoir to serve as an enabler for more powerful DWH’s
Provide all tools needed to get value out of the Data Reservoir
Empower business Users to get value from Big Data (not only geeks or data scientists)
Data Warehouse
Existing Sources Emerging Sources +Existing not used internal
sources
Data Reservoir Data Warehouse
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
A change of paradigm in Information Consumption
7
From “Business Intelligence” ...• Known and structured access
patterns to structured information
• Analytics activity in «delayed mode» vs data generation and preparation time
... To “Business Inspiration”
• IT central role in guidance and skills• High complexity in data management makes
analytical tools a secondary priority
• Free-hand, Search oriented information access based on new business models
• Real Time analytic activity
• New roles and skill are leading: Data Officers, Data Scientists, Data Analysts
• Optimized usability for business-wise users is first priority
A “bi-modal” approach to Business Analytics
Data Lab Catalyst of Innovation Edison’s Invention Factory
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
9Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Data Comes from Activity
Big Data, as a Global Phenomenon, Is
Disrupting Industries PROCESSES
THINGS
PEOPLE
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 10
Data and the Innovation Process
DATA LAB
DATA FACTORY
DATA WAREHOUSE
INVENT COMMERCIALIZERESEARCH DEVELOP
Churn Monetization Upsell 360
Quality Product Design
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The Pillars and the Process
11
= +
Data Lab Ecosystem
Built according to the key requirements.
PillarsTechnology Areas that provides
the required features for current and future needs
ProcessCombines both experimental
approach and production mainstream to maximize the “data
capital”
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The Pillars
Data Lake(all data store)
Easy, visual, friendly way of telling a story
Fast, self-service way of mashing up personal and enterprise data
Powerful, full enterprise scale capability
Data Visualization
Provides a broad range of ML algorithms based on open source, market leading technologies
Combines both ML in the Lab and in the Process
Extend ML with Graph features for analytics on networks based on relationships,.
Machine Learning & Graph
Explores available source data and their relationships (schema-on-read approach)
Transforms data on-the-fly and Discovers hidden patterns
Foudation of the “Lab”
Data Discovery
Secures data at rest (encryption) and on the-the-fly
Provides Access Control (SSO, LDAP) throughout the architectural components
Profiles users according to their rolesSecurity
All in Cloud
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
From Pillars to Process
13
MainstreamLab
Intra-day consolidated tasks
Innovative tasks not yet releasedCollect source data and explore their contents
Select and prepare data for exploitation
Experiment on data through advanced analytics
Bring the value into production
Distribute insights and analyze the return Consumers
Experts
Data Scientists
Experts
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 14
Advanced Analytics Approach
“Data Driven Research” reasoning from the data to the
general theory
Machine Learning on the Process
...
Data Discovery in the LabSource data are initially explored to find out
hidden relationships. This is the basis for picking up relevant features to feed prediction models
( “features engineering”).
Induction
Data Scientists
Experts
Advanced Analytics in the MainstreamThe final step is to run ML models as well as new patterns in the mainstream, make their outcome available for the broad users community through
Data Visualization and Business Intelligence.Consumers
Machine Learning in the LabWhen the data context has been outlined and
most relevant features identified, then ML models can be built and evaluated over historical and new
(lab) data.
Data Scientists
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Bring all the pieces into a blueprint
15
Big Data Platform
Data Lab
Data LakeData Factory
Analytics PlatformData Sources
People
Data Services
Applications
Big Data NoSQL
Data Integration and
Governance
Big Data Discovery
Machine Learning
Graph Analytics
IOTDatabase
+ In-Memory
Data Visualization & Analytics
Services
DataFlow ML
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Machine Learning with Oracle
Oracle Confidential – Highly Restricted 16
Machine Learning at Scale
Huge set of predefined
models
Extends Spark Mllib with R CRAN models and packages
35 built-in Graph Algorithms
Based on Standards
R Language and models
Spark MLLib on Hadoop
Python , Java and Scala Spark
APIs
Gremlin and Blueprints for Property Graph
Transparently move ML models and workloads between on-premise and public cloud
Both on prem and on public
cloud
Cross Technologie
sORE on Oracle
databases
ORAAH, Spark MLlib, Big Data Spatial and Graph on BDA
Real Time Decision and Stream Explorer
Big Data Discovery
Optimized for
performance
ORE and ORAAH are optimezed version of R to exploit parallel processing and hardware capabilities of modern CPUs.
Oracle Engineered Systems enhance ORE, ORAAH and Spark models eleboration shortening time to value.
Thightly Integrated
Machine learning capabilities integrated out-of-the-box throughout the Oracle stack, from data stores, to front end analytics and streaming processing via data integration.
ODI
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 17
Oracle Big Data Discovery. The Visual Face of Big Data
Find Explore Transform Discover Share
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 18
A Data Lab Demo StoryExample in Banking
LuigiAnalista (aspirante Data Scientist)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 19
I dati aziendali vengono organizzati in un catalogo. Luigi può visualizzare i dati a cui è stato abilitato all’accesso. Può anche selezionare i dataset in base alle caratteristiche e può immediatamente visualizzarne tutti dettagli
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 20
Dopo aver selezionato il dataset dei bonifici, Luigi può immediatamente profilarne il contenuto, visualizzando graficamente la completezza e la distribuzione dei valori di ciascun attributo
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 21
Luigi combina più attributi per evidenziarne la correlazione. In questo caso si evidenzia un’inversione per l’anno 2014 del rapporto tra bonifici in ingresso e in uscita. In funzione delle caratteristiche degli attributi Luigi può scegliere tra diverse rappresentazioni grafiche
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 22
Filtrando per l’anno 2014 e selezionando solo i bonifici in uscita, Luigi visualizza in ordine descrescente l’ammontare delle transazioni per ciascuna descrizione. Si evidenza una particolare rilevanza di operazioni legate all’acquisto di titoli che lo incuriosisce
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 23
Questo fenomeno spinge Luigi ad approfondire la sua analisi. Per far questo crea un nuovo progetto aggiungendo a questo il dataset dei bonifici
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 24
Per poter identificare meglio le transazioni d’interesse Luigi definisce una funzione di trasformazione per verificare l’esistenza di determinate parole chiave all’interno della descrizione del bonifico e creare un nuovo attributo. La trasformazione viene aggiunta allo script ed eseguita immediatamente
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 25
Per estendere le possibilità di analisi, Luigi aggiunge al progetto il dataset dell’anagrafica clienti e il dataset pubblico contenente tutti i codici ABI e CAB delle banche italiane
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 26
Per completare la sua analisi Luigi passa alla vera e propria attività di discovery andando a comporre la pagina con cui rappresenterà i risultati. Partendo da una pagina vuota, seleziona i componenti da una ampia libreria che è comunque possibile estendere con componenti custom
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 27
Questa è la pagina completa costruita da Luigi che può ora procedere nelle proprie investigazioni selezionando secondo le proprie esigenze i dati
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 28
Selezionando solo i bonifici in uscita, che contengono almeno una delle parole chiave ed effettuati da clienti nella fascia di reddito da 30.000 a 50.000 verso tre primarie banche, Luigi identifica i clienti a cui si potrebbero proporre prodotti di investimento
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 29
L’elenco dei clienti individuati è immediatamente disponbile per ulteriori investigazioni e può diventare un ulteriore Dataset
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 30
Luigi decide di proporre al marketing questo target per una campagna e per far questo genera un nuovo dataset a cui potrà accedere direttamente il responsabile della campagna
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 31
A Data Lab Demo StoryExample in Banking
BarbaraCampaign Manager
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 32
A Barbara, responsabile della campagna di marketing, viene notificata la disponibilità di un nuovo dataset con il target a cui proporre prodotti di investimento. Decide quindi di analizzare la composizione di questo nuovo target
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 33
Barbara decide quindi di creare un nuovo progetto basato sul dataset che le è stato messo a disposizione
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 34
I dati vengono immediatamente visualizzati lasciando la possibilità a Barbara di variare la rappresentazione intervenendo sulle caratteristiche della visualizzazione
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 35
Per poter analizzare i dati per provincia di residenza Barbara aggiunge al progetto un nuovo datasource scaricato da internet contenente tutti i comuni italiani
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 36
Selezionando provincia, giacenza media totale e nr clienti Barbara richiede la visualizzazione geografica del suo target
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 37
Per finire Barbara aggiunge un grafico per visualizzare la giacenza media per titolo di studio. Selezionando da questo grafico la “fetta” relativa al titolo di media superiore nelle visualizzazione gerografica vengono evidenziate le province in cui risiedono questi clienti
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 38
Al termine dell’esecuzione della campagna a Barbara viene fornito un file con il numero di contatti effettuati e il totale delle somme investite per cliente. Anche questo data source viene aggiunto al progetto per la consuntivazione finale
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 39
Con i dati di redemption Barbara aggiunge una nuova visualizzazione per evidenzare le filiali su cui si sono ottenuti i maggiori investimenti confrontando anche la giacenza media di partenza e il numero di contatti necessari per ottenere il risultato
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 40
Use cases & references
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 41Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Moving to proactive and predictive
• Large Hadron Collider (LHC) is the largest cryogenics system in the world : 27km, +6000 superconductors; 600M collisions per second storing 60TB per year
• Monitoring and Diagnostic system: (temperature, magnetic & electrical fields, pressure) with Data Discovery on 15GB of daily log files
• Run predictive maintenance models on cryogenics faulty valves detection in Oracle Database using R
• Deployed Oracle Big Data Discovery, Oracle Database, Oracle Advanced Analytics
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
BigData@CERN : the Data explosion
42
The CERN Accelerator Logging Service is powerful but also brings new challenges due to exploding datasets vs analysis strong time requirements (seconds)
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
BigData@CERN : the CERN Big Data Solution
43
Accelerator Postmortem Analysis • Diagnostic on failures • Continue operations safely • Interventions Required • Designed for CERN LHC • Extended to injectors complex (SPS) • External Post Operational Checks • Injection Quality Checks
Main challenges• Stringent timing constraint
(every 30 seconds)• High scalability • Huge data storage • IO throughput • Big Data Streaming Analytics
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
BigData@CERN : Big Data Discovery use cases
44
1. EXPLORATION & DISCOVERY• Interactive catalog of all data, attribute statistics,
data quality and outliers• Dashboards and applications
2. TRANSFORMATION• Use of Spark in Hadoop applying built-in
transformations and proprietary scripts• Preview of results, undo, commit and replay
transformations• Data Enrichment:
• Text-based : entity extraction, relevant terms, sentiment, language detection
• Geo-based : address, IP, reverse
3. COLLABORATION• Share and bookmark transformed datasets• Future use of Notebooks
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 45Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Potential savings identified
Data Lab To Find Savings and Cost Reductions in Health Care Budget• United Kingdom’s National Health Service• Identify billing and identity fraud• Optimize treatment by reducing use of less
effective medical procedures• Deployed Oracle Advanced Analytics, and
Oracle Business Intelligence on Oracle Exadata and Oracle Exalytics
$156M
“With one vendor providing the whole solution, it’s very easy for us.” - Nina Monckton, NHS BSA
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
NHS BSA• Responsible for a third of the NHS budget • Manages prescription reimbursement• Delivery of supply chain services to the NHS• NHS Pensions
Challenges• 4 million prescriptions processed/day• 30%+ entered manually • Need to find drugs misuse and fraud & error• Unable to monitor best practice (drug
administration versus outcomes at national level)• Inability to link structured and unstructured data
together
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
Preventing Fraud for European Health Insurance CardAnalyzing Billions of Records in Minutes (prescription)analyzing much larger sets of patient data, the NHSBSA can provide insight that is helping to improve standards of care
Analyzing Unstructured Text to Measure Satisfaction
DALL - Data Analytics Learning Laboratory
Data Scientist, Data Consultant , Statisticians ,Data Lab Coordinator , Information and Data Analyst Team initially supported by Oracle experts
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal
Preventing Fraud for European Health Insurance CardAnalyzing Billions of Records in Minutes (prescription)analyzing much larger sets of patient data, the NHSBSA can provide insight that is helping to improve standards of care
Analyzing Unstructured Text to Measure Satisfaction
DALL - Data Analytics Learning Laboratory
Data Scientist, Data Consultant , Statisticians ,Data Lab Coordinator , Information and Data Analyst Team initially supported by Oracle experts
Our target for 2015/16 is to highlight at least £200 million of potential savings for the NHS through the DALL. The thermometer below shows what :
The DALL, however, isn’t purely about saving money as we can also provide valuable insight into patient care, safety, probity and quality within the NHSBSA and wider NHS.
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 49Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Served 100K Customers
Digital Couponing: a new pattern of adoption• European leader in the design, creation and
management of technology infrastructures and services for Financial Institutions, Central Banks, Corporates and Public Administration bodies
• SIA Group serves customers in 40 countries and also operates through its subsidiaries in Hungary and South Africa
• New Initiative to enter into Digital Ecosystem with Differentiators for New Markets extending SIA’s value chain
Mobility solution allowing real-time cashback on product promotions
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 50Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Increase in revenue in one region by tailoring messages and playing experience
Improve Gaming Experience with Big Data Analytics• Manage and analyze up to 300 billion
events per day• Understand and segment players• Quickly correct game play problems• Deployed Oracle Advanced Analytics and
Oracle R Advanced Analytics for Hadoop on Oracle Big Data Appliance and Oracle Database Appliance
62%
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 51