+ All Categories
Home > Technology > Data Analytics In The Cloud Soa World

Data Analytics In The Cloud Soa World

Date post: 11-May-2015
Category:
Upload: tomplunkett
View: 1,815 times
Download: 1 times
Share this document with a friend
Description:
Data Analytics in the Cloud presentation at SOA World, part of the SOA & Cloud Computing track, focus on open source software, SOA, data analytics, Apache Hadoop
Popular Tags:
26
Open Source SOA in the Cloud: Data Analytics in the Cloud Tom Plunkett Michael Sick SOA World 2009 [email protected] [email protected]
Transcript
Page 1: Data Analytics In The Cloud Soa World

Open Source SOA in the Cloud: Data

Analytics in the Cloud

Tom PlunkettMichael Sick

SOA World 2009

[email protected]@serenesoftware.com

Page 2: Data Analytics In The Cloud Soa World

2This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

• Who are we?• Baselines & definitions

• Targeted Use Cases• Technical convergence & opportunities• Commercial opportunities & drivers

• State of current technology• Commercial & FOSS solutions• Hadoop Focus

• Challenges to Meet Target Use Cases• Economic challenges & the role of “free”• Wide scale challenges in Cloud and data analytics

• Questions• Contacts

Overview

Page 3: Data Analytics In The Cloud Soa World

3This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Data Analytics in the Cloud: Introductions

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 4: Data Analytics In The Cloud Soa World

4This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Tom Plunkett

Extensive Federal Government Experience

IBM Certified SOA Solution Designer

Patents

Teach OOP and Java for Virginia Tech

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 5: Data Analytics In The Cloud Soa World

5This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Michael Sick

Owner: Serene Software Inc. – EA Services Firm

Clients include: BAE, USAF, Raytheon, BearingPoint,McGraw-Hill, Sun Microsystems, Badcock Furniture

Fascinated by technology -15 years running

Commercial & Federal Enterprise Architect

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 6: Data Analytics In The Cloud Soa World

6This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Serene Software

• Serene is a boutique consulting company focusing on delivery of Enterprise Architecture services and solutions

• Service Areas

– IT Governance

– IT Strategy

– IT Cost Containment

– Service Oriented Architectures (SOA)

– IT Solution Selection

– IT Audit & Analysis

• Experience includes: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture, …

• Founded in 2003 (privately held, no debt) and headquartered in Jacksonville, FL

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 7: Data Analytics In The Cloud Soa World

7This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Draft NIST Definition of Cloud Computing

Source: Draft NIST Definition of Cloud Computing, 06/2009

A model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and relea-sed with minimal management effort or service provider interaction

Essential Characteristics Delivery Models Deployment Models

• On-demand self-service

• Ubiquitous network access

• Location independent resource pooling

• Rapid elasticity

• Measured Service

• Cloud Software as a Service (SaaS)

• Cloud Platform as a Service (PaaS)

• Cloud Infrastructure as a Service (IaaS)

• Private cloud

• Community cloud

• Public cloud

• Hybrid cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 8: Data Analytics In The Cloud Soa World

8This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

OSI Open Source Definition

Source: http://www.opensource.org/docs/osd

Free Redistribution

Source Code

Derived Works

Integrity of The Author's Source Code

No Discrimination Against Persons or Groups

No Discrimination Against Fields of Endeavor

Distribution of License

License Must Not Be Specific to a Product

License Must Not Restrict Other Software

License Must Be Technology-Neutral

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 9: Data Analytics In The Cloud Soa World

9This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

The Open Group SOA Definition

Source: http://www.opengroup.org/projects/soa/doc.tpl?gdid=10632

Service-Oriented Architecture (SOA) is an architecturalstyle that supports service orientation

Service orientation is a way of thinking in terms of servicesand service-based development and the outcomes of services

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 10: Data Analytics In The Cloud Soa World

10This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Data Clouds & Data Grids – What‘s the difference?

Sources: Wikipedia & [Grossman 1]

Often Data Clouds & Data Grids are used inter-changeably, we make the following distinctions

Data Grids Data Clouds

• Grid computing system optimized to share large amounts of distributed data

• Focus on technical capabilities

• Often combined with computational grid computing systems

• Data often moved to compute grid for use

• Often oriented towards highly structured scientific data computing applications

• Focuses on perception of infinite storage, computing capacity

• Focus on cost, virtualization & flexible capacity

• Enables scale-up/scale-down economics

• Data moved rarely, locality is a key feature

• Clouds thus far focusing on column oriented, massively scalable data stores

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 11: Data Analytics In The Cloud Soa World

11This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Definition: Mashups

Web available resource that combines data/functionsfrom two or more external resources

Idea of mashup efforts is to reduce the cost ofproducing and consuming resources

Integration should be fast, easy

Often focuses on widely available formats/protocolslike RSS or Atom over HTTP

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 12: Data Analytics In The Cloud Soa World

12This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Data Analytics in the Cloud: Opportunities

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 13: Data Analytics In The Cloud Soa World

13This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Use Case: Cloud Data Analytical Tools for Intelligence Community Field Analyst

Problem Statement: Analytical Tools Obsolete On Deployment, field analysts need timely, configurable data analytics. How does cloud based DA meet the needs of IC analysts

Customer ProblemCloud AnalyticalTools Solution

Customer Value

• Traditional business intelligence tools require years to develop

• Field Analysts confront situations which are rapidly changing

• Petabytes of data require analysis

• Recomposable Cloud Computing Data Analytical Tools

– Apache Hadoop

– Mashups

– Service-Oriented Architecture

• Enabling field analysts to quickly build the analytical tool they need to analyze petabytes of data

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 14: Data Analytics In The Cloud Soa World

14This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Why the “Buzzword” Soup? Convergence of Capabilities

Virtual-ization

SaaS

Free Open Source

Software(FOSS)

Mashups

CloudComputing

DataAnalytics

Convergence of capabilities New opportunities in breadth and depth of DA services

• Big Data: Cloud disk and data storage engines make peta-byte environments availableto new clients

• Value Based Billing: Heavyuse of FOSS in the cloud reduces costs directly & indirectly

• Capacity Scaling: Scaling up/down of capacity in pay-go fashion makes DA available to wider audience

• Composable UI’s: Capabilityto assemble DA results into various interfaces

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 15: Data Analytics In The Cloud Soa World

15This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Early Data Analytic CloudConsumers/Providers

Cloud DAOppor-tunities

Serv

ices

Serv

ices

Serv

ices

Serv

ices

Internet Scale ServiceProviders

SaaS Companies

Social Platforms

Big Internet Companies • Yahoo, Amazon – can build DA on inf.

• Force.com – DA & Warehousing to SBA’s

• Facebook – sell DA access to anon. user info

Large data-centric Tradi-tional Co’s

Insurers

Healthcare & Biotech

Rating Agencies

• BCBS – private clouds across consortium

• Kaiser Permanente – common DA services

• S & P – open DA cloud to customers

Government Organizations

Intelligence Community

Defense Managed Services

Healthcare

• CIA –private org-wide Cloud

• DISA -- offer DA to .mil clients

• SSA – offer DA to fraud prevention analysts

DAaaSProviders

DAaas Infrastructure

SMB DAaaS Provider

• Cloudera –managed Hadoop instances

• ?? – managed DAaaS, simplified, low cost

Profile Types Example Companies

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 16: Data Analytics In The Cloud Soa World

16This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Data Analytics in the Cloud: Technology & Standards

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 17: Data Analytics In The Cloud Soa World

17This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Google MapReduce

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Algorithm for computing distributed problems using adivide and conquer approach with a cluster of nodes

Master node Maps input into smaller sub-problems and distributes the work to the cluster. A worker node may further map the work for a further cluster of nodes. The worker nodes then process the smaller problems, and return the answers back to the master node

Master node then Reduces the set of answers into the answer to the original problem

Page 18: Data Analytics In The Cloud Soa World

18This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Apache Hadoop

Open Source implementation of the MapReduce algorithms

Hadoop can store and process petabytes of data

Subprojects include HBase, Chukwa, Hive, Pig, and ZooKeeper

Yahoo (more than 100,000 CPUs in >25,000 computersrunning Hadoop) and other companies make extensive use of Hadoop

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 19: Data Analytics In The Cloud Soa World

19This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

As-Is Hadoop Simplified ReferenceArchitecture

BusinessIntelligence

ETL Pig Hive

Chukwa HBase

Apache Hadoop

Zookeeper

Structured Data

Unstructured Data

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 20: Data Analytics In The Cloud Soa World

20This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Apache Hadoop Sub-projects

Chukwa • Data collection system for monitoring and analyzing large distributed systems

• Yahoo

Hive • Data warehouse infrastructure for large datasets

• Hive QL query language

• Facebook

• High-level language for data analysis • Compiler for Map-Reduce programs

• YahooPig

Zookeeper • Configuration, Naming, Distributed Synchronization, and group services

• Yahoo

HBase • Similar to Google’s BigTable• Distributed database for structured data• Multi-dimensional sorted map

• Yahoo

Hadoop Sub-projects

Capabilities Example Companies

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 21: Data Analytics In The Cloud Soa World

21This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Data Analytics in the Cloud: Challenges

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 22: Data Analytics In The Cloud Soa World

22This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

To-Be Simplified Hadoop Architecture

UnstructuredData

StructuredDataQuery

Language

HBase

Apache Hadoop

ZookeeperChukwa

AlgorithmLibrary

BusinessIntelligence

SOAP API

REST API

Pig

Hive

ETL

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 23: Data Analytics In The Cloud Soa World

23This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Key Challenges

Speed of Rack Interconnects, Multi-core

Core platform, Data Analytic Components

Make use of super nodes, XML i/o, en/de-crypt

“brutally efficient” pricing, FOSS advantages

Full warehouse migration, ETL,

Interface, metadata optimized for ETL loading

Declarative & programmatic cross language

BI, Applications (SAP, Oracle Financial, Lawson)

Viewing & drill down of very large data sets

Declarative & programmatic cross language

Easy discovery of data & functions & workflows

Parallel current RDBMS, Warehouse admin

Distributed debugging, integration w/ Provider

Multi-level provisioning – co., dept, individual

Reporting, audit trails, view to DA system

Emerging Challenges

Infrastructure

Input & Analysis

Adoption

Administration

Output

Hardware

Parallelization

Node Affinity

Cost

Migration Pain

ETL Integration

Intuitive API’s

Product Integration

Data Visualization

Intuitive API’s

Mashups/Dynamics

Ease of Admin.

Debugging

Flexible Provisioning

System Reporting

Cost Models Accurate, open models of CapEx, OpEx costs

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 24: Data Analytics In The Cloud Soa World

24This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Solutions: Projected & In-Progress

Interconnect $$ dropping, hardware maturing

Platforms advance, market for components

Discovery of capability, affinity into Hadoop, …

FOSS’s game to loose, small diff * a lot = a lot

Migration toolkits for traditional DW products

ETL interface, support of popular packages

SQL like interface in core, language bindings

3rd party adaptors, IWay et al

Modeling, meta-data, traceability, and new UI’s

SQL like interface in core, language bindings

Generic datatypes, discovery services

Integrated & extended admin packages

Commercial distributed debugging

Multi-level provisioning – co., dept, individual

Reporting, audit trails, view to DA system

Emerging Challenges

Infrastructure

Input & Analysis

Adoption

Administration

Output

Hardware

Parallelization

Node Affinity

Cost

Migration Pain

ETL Integration

Intuitive API’s

Product Integration

Data Visualization

Intuitive API’s

Mashups/Dynamics

Ease of Admin.

Debugging

Flexible Provisioning

System Reporting

Cost Models Industry standard ROI/IRR models for CC

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 25: Data Analytics In The Cloud Soa World

25This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Data Analytics in the Cloud: Questions

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Page 26: Data Analytics In The Cloud Soa World

26This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick

Question? & Contact Information

Principle Architect / PartnerMichael A. Sick888.777.1847 [email protected]

AddressSerene Software116 19th Ave. North, Suite 503Jacksonville Beach, FLURL: www.serenesoftware.com

Cloud Computing ArchitectTom Plunkett888.777.1847 [email protected]

AddressSerene Software116 19th Ave. North, Suite 503Jacksonville Beach, FLURL: www.serenesoftware.com

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud

Introductions

Challenges

Opportunity

Technology &Standards

Questions

Data Analytics in the Cloud


Recommended