+ All Categories
Home > Technology > Qrious about Insights -- Big Data in the Real World

Qrious about Insights -- Big Data in the Real World

Date post: 21-Apr-2017
Category:
Upload: guy-k-kloss
View: 104 times
Download: 2 times
Share this document with a friend
41
Qrious about Insights Big Data in the Real World AUT DSRG Workshop Guy Kloss [email protected] Enterprise Architect Qrious Limited 7 February 2017
Transcript

Qrious about InsightsBig Data in the Real World

AUT DSRG Workshop

Guy Kloss

[email protected] ArchitectQrious Limited

7 February 2017

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Outline

1 The Problem

2 Examples

3 The Solution

4 Tools of the Trade

5 Boxing up a Solution

6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 2/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Who/What is Qrious?

We help New Zealand businessesand public sector organisations

create valueand solve their most pressing business problems

by turning data into actionable insight.

Guy Kloss | Big Data in the Real World 3/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Who/What is Qrious?

Backed by SparkApprox. 60 employeesOffices in Auckland & WellingtonSubstantial investment across Data, Platform & PeopleBuilt from the ground up(new generation technology and working principles)One of the largest Data Science teams in the countrywith > 80% qualified to Masters & PhD leveland over 60 years of combined experience years of combined experienceNZs leading data analytics specialist by 2017

Guy Kloss | Big Data in the Real World 4/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Our Capabilities

Advanced analyticsLocation insightsBig Data platformsConsulting servicesBI & Warehousing

Guy Kloss | Big Data in the Real World 5/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Who am I?

Chemical Engineer (Masters)Rocket Scientist (German Aerospace Centre)Computer Scientist (PhD)Former lecturer (AUT)Lead Software Developer and Head Crypto Geek @ MegaEnterprise Architect at QriousDad, baseballer, diver, . . . general geek!

Guy Kloss | Big Data in the Real World 6/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Outline

1 The Problem

2 Examples

3 The Solution

4 Tools of the Trade

5 Boxing up a Solution

6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 7/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Data size

Number of recordsData volume

Guy Kloss | Big Data in the Real World 8/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

An exponentially growing data worldPrimary Memory/Disk Capacity

Guy Kloss | Big Data in the Real World 9/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

An exponentially growing data worldRelative Speeds

Source: http://www.cs.cmu.edu/~amarp/cpu-io-gap

Guy Kloss | Big Data in the Real World 10/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Size Does Matter!

Access/processing beyond a single machine(RAM, disk, CPU)Expensive data transfers at volume(latency, throughput)

Guy Kloss | Big Data in the Real World 11/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Storage Issues

Storage, access, index, findTransfer, manage, prevent data loss

Guy Kloss | Big Data in the Real World 12/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Types of Data

StructuredUnstructuredGraphsFree text. . .

Guy Kloss | Big Data in the Real World 13/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Correlating . . . co-relating . . . mashing . . .

Not single record problemBut an m : n problem

Guy Kloss | Big Data in the Real World 14/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Beyond Exponential

Problems are between exponential and hyperexponential→ Enabling data processing in an exponential world

Guy Kloss | Big Data in the Real World 15/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Outline

1 The Problem

2 Examples

3 The Solution

4 Tools of the Trade

5 Boxing up a Solution

6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 16/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Number of Records

> 1 trillion (109) records: Spark’s location based data setAnonymised for privacy (on ingest)Fully encrypted (at rest and in transport)Continuous/stream ingestionNormalisation and segmentation on data setCorrelating with external data set

→ Finding insights in this “hay mountain”

Guy Kloss | Big Data in the Real World 17/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Data Volume

100s of TB to PB of “Data Lakes”Not just a backup/data graveFully encrypted (at rest and in transport)Includes data querying and processing capability

→ Capability to “store everything” (every thing and kind)

Guy Kloss | Big Data in the Real World 18/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Outline

1 The Problem

2 Examples

3 The Solution

4 Tools of the Trade

5 Boxing up a Solution

6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 19/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Divide and Conquer

Massively parallel processing: MPPParallelise: Map-ReducePipelines: Stream processing

Guy Kloss | Big Data in the Real World 20/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Leverage Data Locality

Bring processing to the data

Guy Kloss | Big Data in the Real World 21/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

The Right Tools

Don’t re-invent the wheelUse existing high performing tools where possibleAvailable high productivity frameworks, making use of high level languagesThe right tool for the type of dataUse the Source, Luke!(Leverage open source based tooling with a community)

Guy Kloss | Big Data in the Real World 22/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

The Right Data Organisation

Row vs. columnar storage→ For analytics often better in columnar format

Guy Kloss | Big Data in the Real World 23/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

In, Out, Cha-Cha-Cha

Ingest data from (legacy, external) source systems→ ETL – Extract, Transform, Load

Make sure the rhythm fits (no missing “Out”)

Guy Kloss | Big Data in the Real World 24/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Outline

1 The Problem

2 Examples

3 The Solution

4 Tools of the Trade

5 Boxing up a Solution

6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 25/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Hadoop

Hadoop and distributionsProcessing tools for relational, streaming, batch, graph, text, search, . . .Allocates cluster resources dynamicallyData distributed (with redundancy),so compute allocated where data is

Guy Kloss | Big Data in the Real World 26/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Hadoop Distributions

Many Hadoop distributions: Similar to Linux distributionsCloudera Partnership with Qrious

“Bronze” partnerAmbitions to become “Silver” partnerand MSP (managed service provider)

Guy Kloss | Big Data in the Real World 27/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Basic Hadoop Tool SuiteExample: Cloudera Hadoop Distribution

Guy Kloss | Big Data in the Real World 28/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

MPP Databases

DB for massively parallel processing (MPP)Greenplum database and forks(based on PostgreSQL)

Guy Kloss | Big Data in the Real World 29/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Generic and Specialised DBs

Generic RDBMS (where useful)NoSQLGraph DBOther columnar species

Guy Kloss | Big Data in the Real World 30/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Outline

1 The Problem

2 Examples

3 The Solution

4 Tools of the Trade

5 Boxing up a Solution

6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 31/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Delivering a Suitable Solution

Includes:System managementConnectivityApplication logicServicesYummy add-ons

Guy Kloss | Big Data in the Real World 32/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

System Management Framework

SecurityDedicated sub-networks with specific firewall rulesExternal firewallsUser and credentials managementLog collectorOther security tools . . .

System accessVPNRemote desktop services

Guy Kloss | Big Data in the Real World 33/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Connectivity

API gateways(Reverse) proxiesSFTP

Guy Kloss | Big Data in the Real World 34/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Application Logic

Platfor-as-a-Service (PaaS)Huge benefits of containerising application logic (using Docker)

→ Much reduced cadence for deliveryAPIs, Micro-ServicesOrchestration of Big Data analysis

Guy Kloss | Big Data in the Real World 35/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Services

Solutioning, buildAnalytics and developmentOperation and maintenance

Guy Kloss | Big Data in the Real World 36/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Bonus Points for . . .

Provenance(reproducibility, auditability, compliance)AI and MLBlockchain(non-repudiation, trust, “smart contracts”,identity management, federation, . . . )

Guy Kloss | Big Data in the Real World 37/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Outline

1 The Problem

2 Examples

3 The Solution

4 Tools of the Trade

5 Boxing up a Solution

6 Flotsam and JetsamGuy Kloss | Big Data in the Real World 38/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

In the Qrious Pipeline

Make Big Data a commodity: Don’t buy, pay what you need!→ Big-Data-as-a-Service – BDPaaS

Sliced, diced and configured to your needsStraight on bare metal,not VMs (like most cloud hosters)

Guy Kloss | Big Data in the Real World 39/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Maximising the Jobmarket

What skills do you need?RDBMS?SAS?NoSQL DBs?Maybe Hadoop is a good answer?

Guy Kloss | Big Data in the Real World 40/41

The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam

Questions?

Parallelise!Guy [email protected]

Just a humble hair–dryer from the 30s:“One of the first machines used forpermanent wave hairstyling back in the1920’s and 1930’s.”Dark Roasted Blend:http://www.darkroastedblend.com/2007/05/

mystery-devices-issue-2.html

Guy Kloss | Big Data in the Real World 41/41


Recommended