+ All Categories
Home > Documents > CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: •...

CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: •...

Date post: 24-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
44
Copyright © 2012, SAS Institute Inc. All rights reserved. SAS DATA LOADER FOR HADOOP CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS SEPTEMBER 2015 JAMES WAITE
Transcript
Page 1: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SAS DATA LOADER FOR HADOOPCUSTOMER CHALLENGES AND SOLUTION BENEFITS

TASS – SEPTEMBER 2015

JAMES WAITE

Page 2: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SAS DATA LOADER

FOR HADOOPAGENDA

What Is Hadoop?

Big Data Challenges

Hadoop Challenges

Data Loader for Hadoop

Demo

Additional Resources

Page 3: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

WHAT IS HADOOP?

Page 4: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

HADOOP WHAT IT PROVIDES

Open-source Software

• Free to download, use and contribute to

Framework

• All program elements, connections, etc. are provided by the software

Massive Storage

• Framework breaks big data into blocks, which are stored on clusters of

commodity hardware

Processing Power

• Concurrently processes large amounts of data using multiple low-cost

computers

Page 5: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

HADOOP WHAT IT OFFERS

Computing Power

• Distributed computing

Flexibility

• No need to preprocess data

Fault Tolerance

• Processing failover, data redundancy

Low Cost

• Open source, runs on commodity hardware

Scalability

• Add unlimited nodes, little administration

Page 6: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

TERMINOLOGY TRADITIONAL

Primary Key

Index

Table

Normalize

Foreign Key

Relationship

Constraint

RDBMS

SQL

Database

Primary Key

Schema

Page 7: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

TERMINOLOGY HADOOP

Hadoop

Pig

Block

Hive

Cloudera

NameNode

YARN

Cluster

JobTracker

HDFS

DataNode

MapReduce

Page 8: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

TERMINOLOGY “IT’S ALL GREEK” TO ME (MOST)!

Είναι όλα τα

ελληνικά

μου.

Παραδεισένι

ο νησί.Όμορφη

αρχιτεκτονικ

ή.

Ο Θεός της

βροντής.

Τραγωδία.

Ολυμπιακοί

Αγώνες.

Γιαούρτι.

Ελληνορωμ

αϊκή.

Μεγάλοι της

λογοτεχνίας

και της

φιλοσοφίας.Σαλάτα.

Αρχαίοι

ναοί.

Μεσογείου.

Page 9: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

BIG DATA DRIVERS AND CUSTOMER CHALLENGES

Page 10: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

CHALLENGE HADOOP SKILLS SHORTAGE

Performing even the simplest tasks in

Hadoop typically requires mastering

disparate tools and writing hundreds of

lines of code.

Fact: There are a limited # of users

with the necessary Hadoop skills

• MapReduce

• Pig Latin

• HiveQL

• HDFS

• Sqoop and Oozie

Page 11: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SAS & INTEL

STUDYHADOOP ADOPTION & CHALLENGES

Research summary: SAS and Intel asked more

than 300 IT-managers from the largest companies

in Denmark, Finland, Norway and Sweden about

the adoption of Big Data analytics and Hadoop.

http://nordichadoopsurvey.com

60% - cited advanced analytics,

data discovery, or as an

analytical lab

22% - would like to

speed up processing

Primary reason for considering Hadoop

Adoption / Obstacles

35% - cited “Resources and Competencies”

Results & Key

Findings

Page 12: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

HADOOP BIG DATA CHALLENGES

Source: Gartner (Sep 2014), Big Data Investment Grows but Deployments Remain Scarce in 2014 By Nick Heudecker, Lisa Kart

Page 13: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

CHALLENGE HADOOP SKILLS SHORTAGE

Performing even the simplest tasks in

Hadoop typically requires mastering

disparate tools and writing hundreds of

lines of code.

Fact: There are a limited # of users

with the necessary Hadoop skills

• MapReduce

• Pig Latin

• HiveQL

• HDFS

• Sqoop and Oozie

Page 14: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

CHALLENGE HADOOP SKILLS SHORTAGE

proc sort data=dsn out=temp;

by usubjid;

run;

data unique;

set temp;

by usubjid;

if not first.usubjid and last.usubjid;

run;

data nodups;

set temp;

by usubjid;

if first.usubjid;

run;

Page 15: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

CHALLENGE HADOOP SKILLS SHORTAGE

public class CalculateDistinct {

public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text("");

public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter)

throws IOException {

word.set(value.toString());

output.collect(word,one);

}

}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter

reporter) throws IOException {

int sum = 0;

while (values.hasNext()) {

sum += 1;

values.next();

}

output.collect(key, new IntWritable(sum));

}

}

Page 16: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

CHALLENGE HADOOP SKILLS SHORTAGE

(cont’d)

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(CalculateDistinct.class);

conf.setJobName("Calculate Distinct");

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);

conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}

}

javac -classpath hadoop-0.20.1-dev-core.jar -d CalculateDistinct/ CalculateDistinct.java

jar -cvf CalculateDistinct.jar -C CalculateDistinct/ .

hadoop jar CalculateDistinct.jar org.myorg.CalculateDistinct /user/john/in/abc.txt /user/john/out

Page 17: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

CHALLENGE HADOOP SKILLS

The skill sets required to leverage the many benefits of a Hadoop driven data

environment are substantial, and often requires training in many areas.

http://hortonworks.com/training/class/applying-data-science-using-apache-

hadoop/

http://university.cloudera.com/instructor-led-training/introduction-to-data-science-

--building-recommender-systems

Page 18: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

CHALLENGE USER TOOLS ARE NOT BIG DATA ENABLED

Big data brings new requirements:

• Access to HDFS

• Parallel Loads

• New Native file types

• Knowledge of file structures

• New languages & code

• Need to transform data In-cluster

User tools are not engineered to process

data inside Hadoop.

• Tools are not optimized for Hadoop

• Users move data out of Hadoop to do

data management and data quality

• This requires more processing time

• Data is duplicated and more storage is

required

• Users do not use the Hadoop platform

as it was designed

Page 19: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SOLUTION SAS & HADOOP

SAS has worked closely with the industry leaders in Hadoop development, an

developed tools and solutions to facilitate and leverage SAS with Hadoop.

a growing asking users to adapt to entirely new languages to leverage Hadoop,

SAS has adapted traditional SAS routines and procedures to leverage Hadoop,

the end result being “SAS users can stay in SAS”.

• SAS/ACCESS Interface to Hadoop

• DS2 Programming

• SAS Data Loader

Page 20: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SOLUTION SAS & HADOOP

Rather than asking users to adapt to entirely new languages to leverage Hadoop,

SAS has adapted traditional SAS routines and procedures to leverage Hadoop,

the end result being “SAS users can stay in SAS”.

DS2 Programming: Essentials

https://support.sas.com/edu/schedules.html?id=1798&ctry=CA

DS2 Programming Essentials with Hadoop

https://support.sas.com/edu/schedules.html?id=2468&ctry=CA

Page 21: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

THE KEY

CHALLENGE

CLOSING THE GAPS IN THE DATA TO DECISION

LIFECYCLE

BUSINESS

MANAGER

TIME TO DECISION

IT SYSTEMS /

MANAGEMENT

DATA SCIENTIST

/ STATISTICIAN

BUSINESS

ANALYST

VALUE CAPTURED

Hadoop Skill

Shortage

User Tools are

Not Hadoop

Enabled

Page 22: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

BIG DATA

MANAGEMENTANALYSTS TAKE

Recommendation

“Use self-service interactive data preparation tools to enhance analyst productivity.” and

“improve the quality of data”

– Gartner, “Data Preparation Is Not an Afterthought”

Page 23: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

THE KEY

CHALLENGE

CLOSING THE GAPS IN THE DATA TO DECISION

LIFECYCLE

BUSINESS

MANAGER

TIME TO DECISION

IT SYSTEMS /

MANAGEMENT

DATA SCIENTIST

/ STATISTICIAN

BUSINESS

ANALYST

VALUE CAPTURED

Hadoop Skill

Shortage

Page 24: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

MARKET TRENDS SELF-SERVICE DATA PREPARATION

Typically, data preparation is 70-80% of the work involved in any analytic project. That number increases as complexities of the data environment increase.

The rise of self-service data-preparation tools … is putting data management directly into the hands of analysts

SAS Data Loader for Hadoop showcases the

company's solid engineering talent and

reputation for building high-quality software

Page 25: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SAS DATA LOADER FOR HADOOP

SOLUTION OVERVIEW

Page 26: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SAS DATA LOADER

FOR HADOOPKEY FEATURES

Point-and-click UI designed for self-service data preparation

Leverage existing skills to prepare data on Hadoop as used on other data sources

Consistency & reuse: apply existing DQ standards on Hadoop data

Familiar toolset for the end-to-end analytical lifecycle

Purpose-Built to run on Hadoop, keeps it simple and focused

Enables parallel data movement and data quality tasks without writing code

Loads data to the SAS LASR Analytic Server

Big Compute: Moves the processing to the data

Page 27: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SAS DATA LOADER FOR HADOOP…

“Purpose-built” easy to use data management solution

to specifically address: acquiring, structuring, cleaning

and transforming data inside Hadoop

SAS Data Loader for Hadoop is a smart approach,

turning the Hadoop environment into a productive

environment; where barriers are removed, and data is

accessible and usable

Page 28: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

Manage data inside

Hadoop

Reduce Complexity of Hadoop

Accelerate Business

user adoption

SAS DATA LOADER ENABLES ORGANIZATIONS TO…

Page 29: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

CAPABILITIES - SAS DATA LOADER FOR HADOOP

• Copy Data to Hadoop

• Profile Data

• Identification Analysis

• Query

ACQUIRE DATADISCOVER DATA

Access data, move it

into Hadoop, and

assess the data

structure and content

1TRANSFORM DATA

• Query

• Select Columns

• Apply Filters

• Map Columns

• Sort / Order

• Calculate Columns

• Transpose data

• Aggregate

• Transform data

Select data of interest,

manipulate it, and

structure it into the data

format desired

2 CLEANSE DATA

• Validate

• Parse

• Standardize

Put data into a

consistent format

3 INTEGRATE DATA

• Join

• Create Match codes

• Sort & De-duplicate

• Aggregate

• Run a SAS program

Combine datasets,

including data that has

no common key,

remove duplicate data,

and create new data

points thru aggregation

4 DELIVER DATA

• Load SAS LASR

• Create tables

• Create views

• Copy from Hadoop

Load datasets into SAS

LASR in-memory

analytic server, Create

new Hadoop tables, and

deliver data to other

databases and apps

5

Page 30: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

INTRODUCING SAS DATA LOADER FOR HADOOP

Self-service big

data preparation

for business users

Certified by Hortonworks and Cloudera

Page 31: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

Business Users

Data Analysts, Data Scientists, Statisticians

Data Management Specialists

PRIMARY AUDIENCE: WHO IS SAS DATA LOADER DESIGNED FOR?

Page 32: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

SAS DATA LOADER

FOR HADOOPBENEFITS

• Users of all skill levels can manage data in Hadoop

• Users can manipulate Hadoop data to fit their specific needs

• No need to write code

• Increases worker productivity and improves data quality

• Leverages the Hadoop cluster including

• Parallel processing

• Minimizes data movement

• Enables reuse of skills you already have

• Unlocks and accesses many types of data

Page 33: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

ADDITIONAL RESOURCES

Page 34: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

FOR MORE INFORMATION

• Learn more about SAS Data Loader for Hadoop

• SAS Data Loader for Hadoop

• Learn more about SAS Data Management:

• SAS Data Management

• Learn more about SAS Hadoop offerings:

• SAS Solutions for Hadoop

• Follow us on Twitter: @sasdatamgmt

• Like us on Facebook: SAS Software

Page 35: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

TRAINING

• Big Data Matters Webinar Series:

• Big Data On-Demand Webinar Series

• SAS Training:

• Introduction to SAS and Hadoop

• DS2 Programming Essentials with Hadoop

• Data Science: Building Recommender Systems with SAS and

Hadoop

THANK YOU!

Page 36: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

USE CASES

Page 37: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

Business Users

Data Analysts, Data Scientists, Statisticians

Data Management Specialists

PRIMARY AUDIENCE: WHO IS SAS DATA LOADER DESIGNED FOR?

Page 38: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

USERS BUSINESS USERS

• Self service access to data

• Query and manipulate data

• Copy data to/from Hadoop

• Load data into SAS LASR

Activities:

Page 39: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

USERS DATA ANALYSTS, DATA SCIENTISTS, & STATISTICIANS

• Create an analytics ready dataset

• Discover new data sources

• Transform and manipulate data

• Optional: Write SAS DS2 code

• Load data into SAS LASR server

Activities:

Analytics ready dataset

Event data

Customer data

Log files

Data Preparation

Page 40: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

USERS DATA MANAGEMENT SPECIALISTS

• Apply enterprise data management practices to Hadoop

• Manage data with discipline inside Hadoop

• Reuse data quality standards inside Hadoop

• Copy data to/from Hadoop

• Optimize SAS code to run in Hadoop

• Learn from Hadoop data discoveries

• Apply knowledge gained in enterprise environment

Activities:

Hadoop

Page 41: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

BUSINESS USER

USE CASE

SELF SERVICE BIG DATA ON-BOARDING, EXPLORATION

AND DISCOVERY

• User copies data from a data source into Hadoop

• User profiles the table to learn the structure/content of the data

• User queries the data and creates a new table specific to their needs

• User loads the new table into SAS LASR

Activities

SAS® LASR ANALYTIC SERVER

Page 42: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

BUSINESS USER

USE CASECONTINUES EXPLORATION AND DISCOVERY USING SAS VA…

SAS Data Loader for Hadoop SAS Visual Analytics

Page 43: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d .

DATA SCIENTIST

USE CASEBIG DATA PREPARATION FOR ADVANCED ANALYTICS

• User access previously run profile report showing table information

• User defines a new table

• Creates new columns using calculations

• Pivots / transposes the table

• Uses functions to aggregate variables

• Writes a SAS DS2 program to append records with a calculated score

• Sorts the data and applies filters

• Then User loads the table into SAS LASR

Activities

Page 44: CUSTOMER CHALLENGES AND SOLUTION BENEFITS TASS …TASS... · Big data brings new requirements: • Access to HDFS • Parallel Loads • New Native file types • Knowledge of file

Co p y r ig ht © 2 0 1 2 , SAS I ns t i t ut e I nc . A l l r i g ht s r e s e r ve d . www.SAS.com

THANK YOU !


Recommended