+ All Categories
Home > Documents > SAS Visual analytics

SAS Visual analytics

Date post: 15-Dec-2015
Category:
Upload: mrdrive
View: 18 times
Download: 1 times
Share this document with a friend
Description:
sas visual analytics pdf
Popular Tags:
31
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS
Transcript
Page 1: SAS Visual analytics

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

ANALYTICS IN BIG DATA ERA

ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,

DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA

MAURIZIO SALUSTI SAS

Page 2: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AGENDA

From DBMS to BIG DATA

Big Data Analytics

Architectural Considerations

Methods

Data Discovery: Visual Analytics

Page 3: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

The ability to generate, communicate, share, and access information has been revolutionized by the increasing number of people, devices, and sensors that are now connected by digital networks.

• People leave information in networks • Devices many ways to provide information • Data are a stream continuos of information • Data are not only measures but text, images, sounds

WHAT IS BIG DATA?

DATA are everywhere:

• IT organization often collect many data in EDW but them

need to integrate with many other sources

Page 4: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Spreading information need drastic changements into paradigm how companies collect their data and how they use it:

• Customer data are not only in Customer company DB. These data give partial customers vision: i.e. Telco operators collect customer voice and sms traffic, while many their customers establish contacts using social media and apps.

• Customers can give many signal on market preferences like a sensor on market but the actual data storage structures and their analytics tools are not be able to deal with these data.

ACTUAL COMPANY DATA ORGANIZATION

DATA ARE DEPLOYED INFORMATION AS SNAPSHOTS:

• DATA WAREHOUSE

• ANALYTICAL DATAMARTS

Same information are replicated in several data structures provide

slow updating process and slow renewal data.

Page 5: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

“Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. The ability to store, aggregate, and combine data and then use the results to perform analysis in motion has become ever more accessible as trends.

TREND COMPANY DATA ORGANIZATION

NEEDS:

• TO AVOID DATA PROLIFERATION

• TO PROVIDE SEVERAL SCENARIO OF SAME DATA

• DATA ENRICHMENT WITH SEVERAL SOURCES

• QUICKLY DATA RENEWAL

• TO PROVIDE PATTERN OF CHANGEMENTS SCENARIO

Page 6: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

New ways to manage distributed and not structured in classical way data are needed: We need different paradigm to organize data and, above all, to query them. Collect several sources and manage them open several new problems:

• Relational data (GRAPH DATA) can be useful to understand event spreading in a population.

• Data in motion coming from several tools on field (sensor devices, smarthphone) provide dynamic pattern often without an history of their form

• Not always data are in structured data model

• Often we need to join data with not same keys

• Often data coming with periodic flow near real time

• Often we need to recognize pattern from data changing

frequently

NEW QUESTIONS

Page 7: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

• SQL Queries often are useless to reach these data: • Information are not organized into DB structures • Data are very different way to provides information: i.e. text

are not easy to query using traditional query languages. • Merging are driven by fuzzy keys where you can assign group

information according statistic relationship. • Event can be happen driven from relational with other data

rather from specific behavior.

ANALYSIS

• Not always you can apply sampling to extract data

• Not always you can join data to define ABT

• Often you need to know how environment can influence

event: like buy, choice, changement.

• Often we need to merging information collected with

different scope.

Page 8: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BIG DATA

What types?

Page 9: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AGENDA

From DBMS to BIG DATA

Big Data Analytics

Architectural Considerations

Methods

Data Discovery: Visual Analytics

Page 10: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Data are stored in different place and you have to know relationship MAPPING coming from different sources. Here before you extract data your query have to know from which place into the net you have data.

DBMS and Datamart help to analyzing data coming from one central point data. You need only to know where data is and their meaning. Query are managed directly from DBMS

Page 11: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

MULTI POINT DATA HUB BUILDING BLOCKS OF A BIG DATA ANALYTICS PROCESS

ANALYTICS

Page 12: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

REFERENCE

ARCHITECTURE EXAMPLE SAS-RACK IMPLEMENTATION

TERADATA

CLIENT

ORACLE

HADOOP

GREENPLUM

Page 13: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Input Output Hadoop

Metadata

High Performance

Analytics

Visual Analytics

Page 14: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Input Output In memory GRID COMPUTING In Database

Visual Analytics

Metadata

High Performance Analytics

Analytical Tool

Page 15: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AGENDA

From DBMS to BIG DATA

Big Data Analytics

Architectural Considerations

Methods

Data Discovery: Visual Analytics

Page 16: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

• Worrying about software performance is not a new

concept at SAS

• What is New?

Dedicated high-performance software

Accelerated development

• Why Now?

» Customer needs

» Blade systems have proven viable platforms for high-performance

computing

» New computing paradigms

» Partnerships with MPP database vendors

SAS® HIGH-

PERFORMANCE

ANALYTICS

Page 17: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS

PROCEDURES

Single-threaded Multi-threaded

Not aware of distributed Aware of distributed

computing environment computing environment

Runs on client Runs on client or DBMS appliance

proc logistic data=TD.mydata;

class A B C;

model y(event=‘1’) = A B B*C;

run;

proc hplogistic data=TD.mydata;

class A B C;

model y(event=‘1’) = A B B*C;

run;

THEN AND NOW

Page 18: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Disks – “/filesys”

Temp/Utility files to support SAS

SAS Datasets

OPERATING SYSTEM

Process

SAS Process

(6) As execution continues, temporary data

is written out to utility files on disk

*SMP HP PROCS do not load the entire source

dataset into RAM – the SAS Process utilizes the

MEMSIZE option as a boundary. No different than

MVA or “regular” procs, datastep, etc.

1 3

2

4 6

5

libname disk BASE “/filesys”;

proc hpreg data=disk.source;

analytic stuff…

run;

SAS Process Steps:

(1) SAS Process Starts on HW & O/S

(2) SAS sets up access library to disk

(3) SAS starts HPREG PROC

(4) HPREG reads data through ACCESS

during computation* (5) Multiple threads are launched to process

the incoming data

HP PROCS IN SINGLE SERVER

Page 19: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

OPERATING SYSTEM

Process

SAS Process

(6) Processing occurs in parallel against in

memory data

1

3

2

libname a sashdat;

option set=gridhost=“NAMENODE”;

proc hpreg data=a.source;

analytic stuff…

performance nodes=all;

run;

SAS Process Steps:

(1) SAS Process Starts on HW & O/S

(2) SAS sets up access library to disk

(3) SAS starts HPREG PROC

(4) Due to GRIDHOST and proper access

engine setting, multi-threaded processes

are started on grid nodes (via TKGrid)

(5) As TKGrid processes start up, ALL data

is lifted into RAM from HDFS.

HPPROCS IN DISTRIBUTED ARCHITECTURE

HADOOP HDAT – SHARED-RACK EXAMPLE

(7) Results return to initiating process on

SAS Server

NODE 1

Data 4 5

NODE 2

Data 4 5

NODE N

Data 4 5

6

6

6

7

HADOOP NAMENODE

4

4

Page 20: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Big data analysis can be done using several analytic strategy. • SAS collects many different methods many of them

coming from traditional statistical inference analysis using SEMMA paradigm.

• Other coming from stochastic process analysis both for continue and discrete events.

• Other coming from linear and not linear mixed models.

• Graph analysis

Page 21: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AGENDA

From DBMS to BIG DATA

Big Data Analytics

Architectural Considerations

Methods

Data Discovery: Visual Analytics

Page 22: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Text Mining

• Parsing

large-scale

text

collections

• Extract

entities

• Auto.

Stemming &

synonym

detection

Data Mining

• Complex

relationships

• Tree-based

Classification

• Variable

Selection

Optimization

• Local search

optimization

• Large-scale

linear &

mixed integer

problems

• Graph theory

Econometrics

• Probability of

events

• Severity of

random

events

ANALYTICAL CATEGORIES AND TARGET USAGE

Forecasting

• Large-scale,

multiple

hierarchy

problems

Statistics

• Binary target

& continuous

no.

predictions

• Linear, Non-

Linear, &

Mixed Linear

modeling

Page 23: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Data coming from different sources can be tie using different methods like canonical decomposition. Data pattern variability on data in motion like data coming from devices can be sampled or simulate pattern distribution using Markov chain Monte Carlo methods . Sparse vector data with missing values can be simulate using MCMC or other regression methods Discrete choice among different events can be defined using multinomial discrete models.

Page 24: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Network

Community

The Network Analysis objectives are: Identifying the subnets (communities) with high potential of information exchange. Measuring changes over time. Producing initiatives which increase the enterprise presence in the single communities knowing the spreading strength of the community.

GRAPH

ANALYSIS

Page 25: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

GRAPH

ANALYSIS

A network is collection of the relationships among nodes by links. A node is an individual featured by qualities which can be transmitted through the links (impulses). A link is the relationship which connects 2 nodes. It can be outgoing, incoming or with no direction.

Node

Link

2

Page 26: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AGENDA

From DBMS to BIG DATA

Big Data Analytics

Architectural Considerations

Methods

Data Discovery: Visual Analytics

Page 27: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

. . .provide very easy to use - yet sophisticated –

statistical graphic tools to all of your users?

… use ad hoc exploration and visualizations to analyze

multivariate results?

……quickly produce mobile dashboards and reports that

convey more foresight than hindsight?

SAS®

VISUAL

ANALYTICS

A Single solution

for Statistical

Visualization and

reporting

Page 28: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS®

VISUAL

ANALYTICS BUSINESS VISUALIZATION DRIVEN BY ANALYTICS

EXPLORATION AND

VISUALIZATION POWER OF ANALYTICS RAPID DELIVERY OF

MOBILE INSIGHTS

Page 29: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BUSINESS

VISUALIZATION

THE DIFFERENCE BETWEEN RAPID INSIGHT AND FAST

INFORMATION

DATA VISUALIZATION ANALYTIC VISUALIZATION

EXPLORATION DISCOVERY

Page 30: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BENEFITS INCREASE THE USE OF ANALYTICS AND BI

• Self-service

• Easy to use Analytics

• Work with more data

• Reporting and Dashboards

• Mobile BI

• Collaboration

Page 31: SAS Visual analytics

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS®

VISUAL

ANALYTICS MEETING YOUR BUSINESS NEEDS THROUGH FLEXIBILITY

Traditional “on premise” Deployments

Public Private Hybrid

SAS Cloud &

SAS Solutions on Demand


Recommended