Date post: | 22-Jan-2018 |
Category: |
Technology |
Upload: | sumit-sarkar |
View: | 329 times |
Download: | 2 times |
Journey to SAS
Analytics Grid with SAS,
R, Python
Benjamin Zenick, Chief Operating Officer -
Zencos
Sumit Sarkar, Chief Data Evangelist -
Progress DataDirect
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.2
Audio Bridge Options & Question Submission
Journey to SAS
Analytics Grid with SAS,
R, Python
Benjamin Zenick, Chief Operating Officer -
Zencos
Sumit Sarkar, Chief Data Evangelist -
Progress DataDirect
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.4
Agenda
Differences between traditional and Grid deployments for SAS
Best practices and lessons learned in deploying an Analytics Grid
How to deliver an open analytics strategy for SAS, R, Python and
others
Popular data sources for advanced analytics
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.5
POLL
WHERE ARE YOU IN YOUR ANALYTICS JOURNEY?
DESKTOP ANALYTICS
CLIENT/SERVER ANALYTICS
GRID ANALYTICS
CLOUD ANALYTICS
OTHER
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.6
Differences between traditional and Grid deployments for SAS
The Evolution of Analytics
Businesses started with large and expensive central mainframes
– Mainframes were limited by early storage and processing technology
– Connectivity and user interfaces to data were limited by “dumb” terminals
– Expansion was limited by proprietary chassis design
– Connecting multiple mainframes was expensive, challenging, or impossible
Analytics Today
• Modernization moved away from Mainframes
• Moved toward server / client solutions, workstations, storage
appliances, and networking
• Shortcoming of centralized datacenters: Administrative and
Performance Bottlenecks
Example of Traditional Deployment
What benefits do grid deployments provide?
• Standardization supporting multiple ecosystems
• Streamline Administrative support
• Better tools for analytics and administration
• Centralizing and improving management
• Size & Scalability
Example of Grid Deployment
Signs your organization is ready to consider an HPC or Grid
solution…
• Decrease in cost benefits
• Current model doesn’t scale well
• Massively Parallelized Processing
• Administrative needs continue to grow and grow
• High(er) Availability is possible
• Faster (Disaster) Recovery
Zencos capabilities prepared for TEST Co.
Top Considerations for “Modernization”
• Why?
• Who?
• What?
• Where?
• When?
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.14
Best practices and lessons learned in deploying an Analytics Grid
Best Practices
• Preparation
• Technologies
• Plan
• Time
• Expectations
• Team
• Transition
• Users
• Support
• Goal Alignment
Lessons Learned
• Invest in a meaningful assessment
• Plan to purchase and build Test and Disaster Recovery
environments
• Understand the applications and use cases
• Outline support model for legacy projects
• Consider your post-implementation needs
• Expect the unexpected
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.17
How to deliver an open analytics strategy for SAS, R, Python and others
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.18
POLL
WHICH LANGUAGE(S) ARE COMMONLY USED IN YOUR
ORGANIZATION
SAS
Python
R
SPSS
OTHER
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.19
SAS and Open Analytics across …
SAS ViyaSAS Grid ManagerSAS (open data access and grid
management for native language support)
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.20
SAS Grid Manager
Image from SAS webinar: https://www.evensi.us/webinar-taking-r-and-python-from-good-to-
great-with-sas-/204358443
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.21
SAS with Open Data Access (ODBC)
Access external data using supported
access modes using data source
specific SAS/Access interfaces.
Leverage generic SAS/Access
interface to ODBC with an open
ODBC driver for direct access from
Python and R.
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.22
Workers
SAS and Open Analytics | SAS Grid (Open Data Access via ODBC)
ODBC
RDBMS, Big Data, NoSQL, Cloud
Access data sources over TCP or HTTPS
Analytics GridOpen Grid Manager
Open Data Access
Controller
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.23
R ODBC Example
library(RODBC)
# Make a connection using your DSN name
conn <- odbcConnect("Spark Next")
# Execute a SQL Tables call
sqlTables(conn)
# Execute a SQL columns call on the table with our energy data
sqlColumns(conn, "energyconsumption")
# Bind the results of a SQL query for plotting
data <- sqlQuery(conn, "SELECT * FROM energyconsumption WHERE country IN ('China', 'United States', 'Canada', 'France', 'Germany', 'Italy',
'Japan')")
# Attach the data for plotting access
attach(data)
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.24
Python ODBC Example
import pyodbc
import getpass
import sys
def show_odbc():
sources = pyodbc.dataSources()
dsns = sources.keys()
sl = []
i = 1
for dsn in dsns:
sl.append( str(i) + '. %s' % (dsn))
i= i+1
print('\n'.join(sl))
return dsns
def listTables(cursor):
for row in cursor.tables():
print row.table_name
def executeSelectQuery(cursor, cnxn):
query = raw_input('Enter the SELECT Query:')
cursor.execute(query)
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.25
DataDirect ODBC is engineered for GRID and Cloud
Deliver advanced functionality over OSS to become SAS OEM Partner
Run 85+ million QA tests on our suite of connectors
Performance labs measure throughput and resource utilization (CPU and memory)
Focus on security features for customers to achieve regulatory compliance
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.26
Popular data sources for advanced analytics
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.27
Popular Relational/Analytics Data Sources
SQL Server 18.70%
Oracle 12.89%
MySQL 12.77%
Progress OpenEdge 7.93%
PostgreSQL 5.65%
Microsoft SQL Azure5.27%
IBM DB2 4.76%
SQLite 3.68%
Teradata 2.61%
SAP HANA 2.30%
MariaDB 2.25%
Sybase ASE 1.92%
Amazon Redshift 1.79%
Informix 1.64%
Sybase IQ 1.30%
Netezza 1.25%
Other (please
specify): 1.13%
Amazon Aurora 1.00%
Not sure 0.97%
Pivotal Greenplum0.87%
Google BigQuery 0.77%
Vertica 0.61%
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.28
Popular Big Data Sources
Hadoop Hive 18.53%
Spark SQL 8.17%
Hortonworks 7.97%
Cloudera CDH 7.87%
Cloudera Impala 7.47%
Apache Solr 7.37%
Oracle BDA 6.67%
Amazon EMR 5.98%
Apache Sqoop 5.48%
MapR 5.38%
IBM BigInsights 4.68%
Apache Storm 4.08%
Apache Drill 2.39%
Apache Phoenix 2.39%
SAP Altiscale 2.19%
Pivotal HD 1.89%
Presto 0.80%
GemFireXD 0.70%
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.29
Popular NoSQL Sources
MongoDB 35.60%
Cassandra 14.57%
HBase 10.34%
Oracle NoSQL 9.01%
Redis 8.45%Other (please
specify): 6.01%
Couchbase 5.78%
DynamoDB 2.78%DataStax
Enterprise 2.22%
SimpleDB 2.22%
MarkLogic 1.67%Aerospike 0.78%
Riak 0.56%
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.30
What about SaaS?
Data Source API
Eloqua Web Services API (REST/SOAP)
Bulk and non-Bulk APIs
No query language
Oracle Service Cloud Web Services APIs (REST/SOAP)
ROQL
Google Analytics Hypercube (query limits of 10 metrics grouped by
max of 7 dimensions)
Veeva CRM SOAP, BULK, Metadata APIs
SOQL
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.31
Supported ODBC Data Sources for SAS/Access
Apache Hadoop Hive 0.8.0 and higher
Amazon EMR 2.1.4 and higher
Amazon Redshift
Apache Spark SQL 1.2, 1.3, 1.4, 1.5
Cloudera CDH update 4 and higher
Cloudera Impala 1.0, 1.1, 1.2, 1.3, 1.4
Cloudera Impala 2.0, 2.1, 2.2
Hortonworks 1.3 and higher
IBM BigInsights 3.0 and higher
MapR 1.2 and higher
Pivotal HD 2.0.1 and higher
DB2 V9.1, V9.5, V9.7, 9.8 for Linux, UNIX, Windows DB2 V8.x for LUW
DB2 11 for z/OS* DB2 V10 for z/OS DB2 V9.1 for z/OS
DB2 UDB V8.1 for z/OS
DB2 I 7.1, 7.2* (DB2 UDB V7R1, V7R2 for iSeries)
DB2 I 6.1 (DB2 UDB V6R1 for iSeries)
DB2 for I 5/OS (DB2 UDB V5R4 for iSeries)
Eloqua (Oracle Marketing Cloud)
Financial Force
Google Analytics
Greenplum 4, 4.1, 4.2, 4.3
Greenplum 3.3
Hubspot
Informix Dynamic Server 12.1*
Informix Dynamic Server 11.0, 11.5, 11.7
Informix Dynamic Server 10.0
Informix Dynamic Server 9.2, 9.3, 9.4
Informix Dynamic Server 11.0, 11.5, 11.7
Informix Dynamic Server 10.0
Informix Dynamic Server 9.2, 9.3, 9.4
Marketo
Microsoft Dynamics CRM 2011 Rollup 16, 2013, 2015
Microsoft SQL Server 2014*
Microsoft SQL Server 2012
Microsoft SQL Server 2008 R1, R2
Microsoft SQL Server 2005
Microsoft SQL Server 2000 Desktop Engine (MSDE 2000) Microsoft SQL Server 2000
Microsoft SQL Azure*
MongoDB 3.0
MongoDB 2.2, 2.4, 2.6
MySQL Enterprise Edition 5.0, 5.1, 5.5, 5.6*
Oracle 12c R1 (12.1)*
Oracle 11g R1, R2 (11.1, 11.2)
Oracle 10g R1, R2 (10.1, 10.2)
Oracle 9i R1, R2 (9.0.1, 9.2)
Oracle 8i R3 (8.1.7)
Oracle Service Cloud
Oracle Sales Cloud
Pivotal HAWQ 1.1*, 1.2*
PostgreSQL 9.0, 9.1, 9.2, 9.3, 9.4*
PostgreSQL 8.2, 8.3, 8.4
Progress OpenEdge 11.0, 11.1*, 11.2*, 11.3*, 11.4*
Progress OpenEdge 10.1.x, 10.2.x
Progress Rollbase 2.0 and higher*
REST API (via OpenAccess)
SAP Adaptive Server Enterprise 16.0*
ServiceMax
SugarCRM 7.1.6 and higher*
Sybase Adaptive Server Enterprise 15.0, 15.5, 15.7
Sybase Adaptive Server Enterprise 12.0, 12.5, 12.5.x
Sybase Adaptive Server Enterprise 11.9
Sybase IQ 16.0*
Sybase IQ 15.0, 15.1, 15.2, 15.3, 15.4
Veeva CRM
Blue text indicates cloud hosted
Blue text* indicates cloud hosted with on-premises option
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.32
NEW cross data center access for SAS/Access interface to ODBC (over https)
SAS/Access interface to
ODBC
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.33
Learn More about Data Access for SAS Analytics
What DataDirect Does for SAS Shops
“Taking R and Python from good to great with SAS” [Webinar hosted
by SAS in April 17]
Zencos Consulting Blog
Tech Articles on configuring SAS with ODBC:
• SAS/Access 9.4 interface to ODBC Tutorial across popular data
sources such as SQL Server, Salesforce and Amazon Redshift
• SAS/Access 9.4 interface to ODBC Tutorial across cloud data
sources such as Marketo and Eloqua
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.34
Wrap Up with Q&A
Slides and recording will be made available to each attendee
Visit www.datadirect.com to learn more about ODBC drivers engineered for analytics
Please enter your questions in the chat...