Present and Future of Enterprise BI - DAMA IndianaPresent and Future of Enterprise BI ... • DW/DM...

©2011 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without

notice

Present and Future of Enterprise BI

January 17, 2013

Prepared for DAMA

Agenda

1. DB engines for BI and contrast with OLTP DB engines • Row-DB, column-DB, index/non-index, in-memory • Contrasts BI workload with OLTP/ERP workload • Our experience

2. MPP systems for BI: shared-nothing and shared-disk • IQ, HANA, Exadata, Netezza, MySQL • Our experience

3. Including unstructured data (“Big Data”) in BI • Our experience

4. Design the “dream” BI system 5. Comparing various “real” and “dream” BI systems

• Facts and our experience 6. Brief EDMT BI Overview 7. Open Discussion

Background

BMMsoft offers consulting and BI products/solutions for: • DW/DM and ETL on Sybase IQ, HANA, Oracle, Exadata, Netezza , MySQL • Extending enterprise BI with Big Data • HA, SLA(“how many-nines?”), DR, B/R solutions for BI systems • Scale and speed: “World’s Largest DW” (2002, 2004, 2007) and “World’s Fastest Data

Loader”(2011, 330 TB/day on HP DL980)

Paul Krneta, CTO of BMMsoft, • 20 years of industry experience in computer and database technology and architecture • CTO of Sybase IQ 2000-2007

• architected the MPP option for Sybase IQ (“IQ Multiplex”) • designed NonStopIQ (HA, DR and B/R for VLDB version of Sybase IQ) • optimized IQ for VLDB, certified IQ 3 times as the "World's Largest Data Warehouse“

• 2002 – 48 TB (200 B rows) • 2004 - 150 TB ( 1 T rows) • 2007 – 1,030 TB (1PB in 6 T rows) of structured and (opt) unstructured data

• Technical Director for DB Technology at Digital Equipment (DEC) 1994-2000 • Designed first In-memory DB: Oracle VLM Option (“Very Large Memory”) in 1995 • 1 TB/hour live backup of Oracle, Sybase, Informix, SQL Server, Adabas in 1995-1996

DB LANDSCAPE:

BI VS. OLTP

Categories of DBs for BI

Different DB architectures : 1. R=row-oriented DB 2. C = columnar DB 3. H-RC= hybrid row+columnar DB 4. Compression 5. NI=non-indexed DB 6. I = indexed DB 7. MPP-SN shared nothing DB 8. MPP-SD shared-disk DB 9. In-Memory DB 10. SQL, NoSQL, object, KV-pair = DB 11. ACID and non-ACID “unreliable” DB 12. HA, DR, B/R, Test/Dev 13. BLOB storage: in-row/column,

separate store, external BLOB 14. Text search: in-db or external 15. UDF 16. Storage efficiency, Green

Types of queries: 1. Pin-point query

• Interested in small # of rows selected from Bs and Ts of rows – i.e. call center, ATM etc.

2. Analytic query • Analyzes Ms, Bs, Ts rows (1%-100% ) of the

entire DB

3. Mixed search of structured+unstructured data • Single query cross-searches SQL rows and text • Best: both queries done using single engine • Worst: two engines (one SQL, one Text)=relict of

“divided SQL/text world”

4. Text search/analytics 5. OLTP (heavy updates)

1-5 day sessions: “BI for Today and Tomorrow”, “NonStopIQ bootcamp”, “BI Assessment” http://download.sybase.com/presentation/TW2005/AM21.pdf

Quick overview of “original” row DB

1. Record (“row”) has multiple fields (i.e. date, name, amount etc.) 1) Fields of a row placed next to each other (on disk and in RAM) 2) Each filed has single value (typ.) 3) Order of fields in DDL is irrelevant (sort of) 4) To get to Nth field in the row, DB “scans” each of previous N fields

2. DB page contains multiple unrelated) records 1) DB page is unit of storage management, IO and caching in RAM 2) Row (typ.) can’t span multiple pages= limits # fields, length of row

3. ACID applied all the time 4. Locking at record level 5. Small number of fields indexed

1 2 3 4 …...….. 100

Row DB

DB page(“block”): 2-32KB

SQL: Create table ABC yellow, blue, red, magenta

SQL: Select sum (red) from

ABC

Row-DB vs. Columnar DB

1 2 3 4 …...….. 100

Columnar DB


ABC

Row DB SQL: Create table ABC

yellow, blue, red, magenta

1. Both use ANSI SQL & ODBC/JDBC 2. Column structure (invisible to apps and admin)

• Reduces I/O by 90-99% (eliminates full-table-scans) • Flex schema = add/remove columns on the fly • wide tables=simple, rich schema (i.e +42,000 column) • Large I/O can use large(+400GB), low-cost disks • Great match for BLOB data (image, video, email, docs…)

3. All row DBs have indices (almost unusable w/o indices) 4. Column DB w/ indices : bitmap, bit-wise, text +more

• Column+index queries 2x-1,000x faster than “classic” DBMS • Fast to load, small size, have all data statistics

5. Data Compression 90% cost reduction • “Row DB” is 4x-10x larger than Column DB • Disks for row DB costs 8x-20x more • Fast, no fragmentation, always ON, no LVM nor FS

6. Multi-node • Multi-

Db page

2-32KB

1 2 3 4 ………. 100

Db page

512KB

BI/DW vs. OLTP S

pe

ed

, S

ca

lab

ility

(#

use

r &

da

ta s

ize

)

OLTP = simple query • “touch/update” 10s of rows per query

• query takes seconds and few resources

• simple SQL statements

DSS = complex query • “touch” Th-M-(B-Tr)illions of rows

• query takes sec-hours to finish • Complex (10-page) SQL statements

• (typ.) 10x larger than OLTP DBMS

VLDB

10,000

1000

100

10

1

In-memory DB (HANA)

To Index or not to Index ?

1 2 3 4 …...….. 100

Column DB


ABC



1. Row DB: Index is critical to avoid slow, costly full-table scans • Reduces I/O by 90-99% (eliminates full-table-scans)

2. Column DB without indices • Every query scans column(s) slow, heavy I/O & CPU load • Complex queries scan many columns (=much of a DB) • May be faster (but not much) to load • Uses less space (but needs faster disks for scans)

3. Column DB w/ indices : bitmap, bit-wise, text +more • Many queries use index only (=fast, low I/O, CPU use) • Indices have statistics about data = better QEP • No scans = Reduced I/O • Large I/O = use large(4TB), low-cost disks ($400/TB)

Db page

2-32KB

1 2 3 4 ………. 100

Db page

512KB

BI: Reporting vs. Advanced(Ad-hoc) S

pe

ed

, S

ca

lab

ility

(#

use

r &

da

ta s

ize

)

Reporting • “interested” in many rows per query

• predictable queries

Advanced (ad-hoc)query • “touch” Th-M-(B-Tr)illions of rows • query takes sec-hours to finish • unpredictable, complex queries

COLUMN DB with index

VLDB

10,000

1000

100

10

1

In-memory DB (SAP HANA)

BI: Data Scalability S

pe

ed

, S

ca

lab

ility

(#

use

r &

da

ta s

ize

)

DB size (TB, PB) # columns


“row” DBs

“row” DBs with HW “column” filters

10,000

1000

100

10

1

In-memory DB (HANA)

BI: Resource consumption R

eso

urc

e u

sa

ge

CP

U, R

AM

, I

OP

S a

nd

BW

DB size (TB, PB) # columns

10,000

1000

100

10

1

BI: Speed and efficiency P

erf

orm

an

ce

Re

so

urc

e e

ffic

ien

cy (

CP

U, R

AM

. IO


10,000

1000

100

10

1

Predictable/static Data and queries

UN-Predictable Data and queries

In-memory DB (HANA)

In-memory DB (OLTP & BI): SAP HANA

1 2 3 4 …...….. 100

Column DB


ABC



1. HANA: ANSI SQL and odbc/jdbc 2. HANA: compression is always on, 5:1 – 20:1

• Single HANA server (4 TB RAM) can hold 15-60 TB of data • No transactional I/O to disk (except log file and start/stop) • Row or column DB (at table level)

3. HANA : much more than DB cache in RAM • Data access optimized for RAM • Supports multi-node configurations • 100s and 1,000s times faster than “std” on-disk row DB

4. HANA : RAM -DB for 0.1TB – 50 TB data (even more) • Good fit for complex, real-time BI/OLTP/ERP workloads • Benefits from cheap/big RAM and fast CPUs • Pricey (“too fast”?) for huge “warm/cold” data (+100TB?) • HANA+IQ = good mix of in-memory and on-disk DB

Db page = 2-32KB

1 2 3 4 ………. 100

Db page = 512KB

MASSIVELY PARALLEL PROCESSING - MPP

(“DIVIDE-AND-CONQUER “)

SHARED-NOTHING VS. SHARED-DISK

3 ways to add more CPU and storage

There are 3 ways to add more CPU power and storage : 1. Use larger server (more CPUs, RAM, I/O channels,)

1. Limited by the size of largest SMP server (128 cores, maybe 512 cores) 2. Can be expensive 3. HA and DR can be expensive

2. Divide data into many small partition s (MPP Shared Nothing or MPP S-N) 1. Add server (“node”) to “own” and process each data partition 2. Node = server+data “slice”: adding server requires adding storage and vice versa 3. Query has to be spread to every nodes 4. Results have to be collected and merged 5. Simple to implement, has some drawbacks

3. Many servers access shared data and process it (MPP Shared Disk or MPP S-D) 1. Optimal for indexed column DB because of low IO 2. Difficult to implement, smart and flexible to use 3. Suboptimal for row DB or scanning DB or storage HW filters : all need heavy IO 4. Server can be added without affecting storage 5. Storage can be added without affecting servers 6. Architectural HA : server crash does not affect data access

MPP S-N (“shared nothing”)

17 1/17/2013 17

Server (A)

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

A

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

B

Server (B)

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

C

Server (C)

Server (D)

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

D

Server (E)

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

E

Add/remove node: significant time new node is “empty”, need redistribute data from other nodes Add storage: significant time must take data from other nodes Remove storage: hours/days needs to redistribute data to other nodes

MPP S-D (“shared disk”)

Scalable performance and data, flexible, config

1/17/2013

DL980 (A,B,C,D,E)

FC Switch

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

A

DL980 (A,B,C,D,E)

DL980 (A,B,C,D,E)

DL980 (A,B,C,D,E)

DL980 (A,B,C,D,E)

Add/remove server: <1 min Add storage: <1 min Remove storage: < 1 min(*)

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

C

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

B

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

E

36 TB Array

36 TB Array

36 TB Array

36 TB Array

36 TB Array

D

MPP : S-N vs. S-D

Current BI and Big Data: servers+storage are “sold together”

MPP S-D:

Small data,

Many CPUs

MPP S-D (indexed) vs. MPP S-N

Flexibly combining storage and servers

MPP S-N

MPP S-D(I) High-CPU Low-data

MPP S-D (I) High-CPU High Data

MPP S-D (I) Low-CPU High Data

MPP S-D(I) Low-CPU Low-data

MPP S-N , S-D, C-non-index and C-indexed Sybase IQ/EDMT 4XL (Full Rack)

MPP Shared Disk

2

160 Intel E7-4870

(2.4GHz) (No need for HW filter)

+100 TB/sec (indexed, no scan)

+30 TB/Hr

432 TB

+1,000 TB

96 racks (+500 custom)

15,360 (+700,000 custom)

On-line addition or removal of nodes Requires reorganization/repartitioning of Data with addition or removal of nodes

http://www.zdnet.com/blog/btl/emcs-launches-greenplum-appliance/40281

MPP S-N – HANA (in-memory DB)

22 1/17/2013 22

Server (A)

Server (B)

Server (C)

Server (D)

Server (E)

HANA (in-memory DB)

23 1/17/2013 23

ADDING UNSTRUCTURED DATA TO BI,

STORING TBS, AND PBS OF DATA,

TEXT SEARCH

Adding unstructured data to BI :

Load/Storage and Cross-Analysis

Problem 1: Load and Store

Load+store= too much for IT

1. Volume=Too Big: 100s of TB, multi-PB

2. Volume= Too Many: Billions & Trillions

3. Variety= too many diff data types

4. Velocity=Slow Load+Index of Data

5. Cost of Data Storage is high

Problem 2: Cross-analysis

No cross-analysis of SQL and Text

1. BI = only SQL analysis (no text)

2. Text analysis= only text, no SQL

3. No cross-analysis of SQL and Text data

(at large scale)

Storing 1 PB in Hadoop (default config)

Hadoop 1 PB of data (default config) Hadoop node: 8 TB of data/ node (24 TB raw, w/ 3x copy) Node= 8-core Xeon, 16 GB RAM, 12x 2TB disks, 2RU = $4K HW= 125 nodes (6 racks), 3 PB raw, 1,000 disks = $500,000 Power= 125 kW (incl. A/C) = $109,500/year ($0.1/kWh) ~600 Tons of CO2 per year (=~120 cars )

Hadoop 1 PB

Storing 10 PB in Hadoop (default config)

Hadoop 10 PB 1,200 servers, 12,000 disks, 60 racks, $5M ($4K/node), 1,200KW = $1.1 M /year in electricity (@ $0.10/kWh) ~6,000 Tons of CO2 per year (=~ 1,200 cars )

Hadoop 10 PB

Hadoop SW

OPERATIONS:

HA, B/R, DR, UPGRADES, LIFECYCLE

AND MORE

HA, DR and Backup/Restore ?

1. HA and DR is tricky for MPP S-N 2. MPP S-D handle HA, failure and change easier, but need plan 3. Text Engines : HA, DR and B/R BI engines is a afterthought 4. Tapes? Not a good media, very slow,

Uptime downtime per year

Specialists for HA/DR for MPP

1. Some of world’s largest DW use NonStopIQ 2. zero-downtime backup 3. Near-zero-downtime restore 4. Full DR and HA 5. Storage cost of $400/TB (HP P2000 MSA) opens new possibilities 6. Tapes? Should you even bother when storage costs is $400/TB ?

Building Large BI since 2002

DESIGNING THE “DREAM” BI

Dream BI

1. Fast , scalable and flexible BI engine 1. Speed: query and data loading speed 2. Scales well with data volume, query complexity and #users 3. Flexible configuration : add/remove storage/server as needed 4. Compatibility with 3rd party enterprise reporting and anlytic tools

2. Integrates rich text search into BI queries 1. Easy and cost-free inclusion of text search into BI analytics 2. Fast loading of text data – without jeopardizing existing SQL data

3. Able to store large volumes of structured and unstructured data 1. “deep history” of SQL data and unstructured data

4. HA, DR, B/R, ACID, flexibility etc . 5. Price: affordable and comparable with Open Source

BI/DW Analytics

Text Search & Analytics

Big Data Store

(‘Archive’) + + =

Dream BI Solution

EDMT SOLUTION

Terminology

EDMT stands for.

Emails (any type of communications – email, SMS, skype..)

Documents (100s of file and doc formats)

Multimedia ( image, audio, video and more)

Transactions (“standard” DB records )

EDMT Solution:

Pragmatic Approach to Data

Store Data

EDMT Solution stores emails, SMS,

Documents, Multimedia and DB

Transactions in RDBMS i.e. IQ for data

retention and mixed BI+text analysis

SQL+Text Analysis of All Data

EDMT cross-analyzes all data using

SQL+Text analysis to run Fraud Detection,

e-Discovery, CRM, Audit, GRC, BI etc.

10x, 100x or 1,000x faster than before

BI/DW Analytics


Big Data Store

(Archive) + + =

Dream BI Solution ?

EDMT solves what others cannot

EDMT: Big Data 2.0

Innovating data technology

• Enables BI systems to store and analyze unstructured data

• Broad DB support

• Supports all DB architectures : • R=row-oriented • C = columnar • I = indexed • NI=not indexed • SD=MPP shared disk • SN=MPP shared nothing

• OS:

• Certified: Linux, HP-UX (incl. Poulson)

• Verified: AIX, Solaris, Windows

SAP Sybase

IQ (C-I-SD)

SAP Sybase

ASE (R-I-SD)

Oracle RAC

(R-I-SD)

Netezza (R-F-NI-SN)

Oracle Exadata

(R-F-I-SD)

MySQL (R-I)

SAP HANA

(RC-I-SN) (Q1 ‘13)

2007: 1 Petabyte EDMT Solution

EDMT Big Data 2.0

• 1 PB of data (= 6 Trillion rows) loaded and indexed • Loading speed : 285 B rows per day (= 35 TB/day) • Load latency: < 2 sec • Pin-point search of 6 T rows = 0.5 sec • DB = Sybase IQ

2012 : 1 PB + new HW = PB for masses

40-core DL 980

½ HALF Of

RACK EMPTY

• Same data capacity and speed as 2007 “1 PB “ 1. 1/15 in physical size, cost, electricity, weight 2. Deploys in 1 week 3. 288 TB of raw storage ( ~$115,000 $400/TB) 4. 40-core Xeon Linux server

• Price :

SW + HW = ~$500,000 Amount of data stored = 1,030 TB $/TB of data = ~$480/TB of data

EDMT architecture

Innovating data technology

IQ, HANA, Oracle,

Exadata, Netezza, MySQL

DB Storage

EDMT

Server

EDMT

HW

Data Management, Access Control, Alerts, Auto-Classification,

Collaboration, Taxonomy, Data Retention, Connectivity, Search API ED

MT

AP

I &

Con

ne

cto

rs

E

D

M

T

ETL

(INGEST)

EDMT

Modules

Real-time ETL Parser, Metadata

Manager, Parallel Loader

ETL Storage

Linux, HP-UX Linux x86

ETL and Application Servers Database Servers

EDMT Data Access & Analysis Layer

EDMT GUI

Web Services

Data Export

Proxy Mobile

GUI eDiscovery, Audit,

Faud Modules

Social Net

Analysis

2012: 1 Petabyte EDMT : for masses

• Out-of-the-box features of EDMT 1. Enterprise BI engines (IQ/HANA: SQL, ACID) 2. Connector for Business Objects, Cognos etc. 3. Complex data reporting and visualization 4. eDiscovery, Litigation hold, Audit, Compliance 5. Full-text, proximity, and dictionary search 6. FINRA post-review and random sampling workflow 7. Cross analysis of structured+unstructured data 8. Email+file archive, indexing & auto-categorization 9. Multimedia archiving, indexing, and auto-cat 10. DB record analytics and archiving 11. Retention, WORM and records management

• Price : SW + HW = ~$500,000 or ~$480/TB of data

BI/DW Analytics


Big Data Store

(Archive) + + = Dream BI Solution

40-core DL980

EMPTY ½

RACK

EDMT systems

EDMT Big Data Appliance: Certified and Pre-Configured

• And beyond…

store + index index only

7 16K [ 96 racks ] 15,360 41,472 180 B 1,800 B 640 Tril l ion

6 4K [ 24 racks ] 3,840 10,368 48 B 480 B 160 Tril l ion

5 1K [ 6 racks ] 960 2,592 12 B 120 B 42 Tril l ion

4 4XL [ Full rack ] 160 432 2 B 20 B 7 Tril l ion

3 PB [ 1/2 rack ] 80 288 1.6 B 16 B 6 Tril l ion

2 XL [ 1/3 rack ] 40 144 600 M 6 B 2 Tril l ion

1 L [ 1/4 rack ] 24 72 300 M 3 B 1 Tril l ion

M [ 2 RU ] 12 36 150 M 1.5 B 500 B

S [ 2RU ] 6 36 150 M 1.5 B 500 B

XS [ 2RU ] 4 36 150 M 1.5 B 500 BOn

Lin

e

EDMT® Solution Models and Specifications

Model Description#

cores

Disk

Size (TB)

# emails & files (100KB each) DB rows

(150-byte)

Mid

Entry

High

Configuration Rule 1 Two or more EDMT® Solutions can be combined in one larger EDMT Solution

Configuration Rule 2 Storage can grow in 36TB increments ("Array", $14,000 or $400/TB )

EDMT systems


• Entry level – hardware valued at $125,000 to $250,000 (US List Price)

# email & files (store + index) (100KB each)

# email & files (index only) (100KB each)

DB rows (150-byte)

1.6B 16B 6Trillion



DB rows (150-byte)

600M 6B 2Trillion

EDMT systems


• Mid level – hardware valued at $350K to $2,000,000 (US List Price)



DB rows (150-byte)

12B 120B 42Trillion



DB rows (150-byte)

2B 20B 7 Trillion

EDMT systems


• High level – hardware valued at $8M (US). 4x larger system 16K at $30M (US)



DB rows (150-byte)

48B 480B 160 Trillion



DB rows (150-byte)

180B 1,800B 640 Trillion

EDMT systems


• Highest level – EDMT supports up to 12,000 nodes

Federated EDMT using IQ and HANA

+12 racks

More info about 1 PB HANA: http://www.saphana.com/community/blogs/blog/2012/11/12/the-sap-hana-one-petabyte-test

EDMT 1 PB

1 server/ 80 cores/1TB RAM

1/2 rack, 288 TB disks

~$500K (HW+SW) IQ

+ HANA ($TBD)

PB of raw data 6 Trillion rows Star schema Load 285 B rows/day Search 6 T rows = 0.5 sec 50 concurrent streams

HANA

disks

IQ (HP-UX

or DL980)

HANA 1 PB

switch

EDMT: Federated IQ/HANA vs. Size/Speed

Speed

Data size

10,000

1000

100

10

1

EDMT @ IQ

EDMT @ HANA

EDMT @ HANA+IQ

Low

Med

High

Small

(< 100 TB)

Med

(100 TB - 1 PB) Large

+1 PB)

Multi-site DR w/ NonStopEDMT (2010)

IQ 1

Server 3 - PowerExpress 520; AIX Internal: 10.26.51.62 [hqiq01] External:

IQ 2

IQ 3

Remote Site

Server 1 - Xeon 8-core; Linux; Internal: 10.26.51.61 [hqetl01]

External: 216.207.70.33 Ports: Smtp & Http

Server 2 - Xeon 8-core; Linux; Internal: 10.26.51.65 [hqetl02]

External: 216.207.70.32 Ports: Smtp & Http

SAN

Staging_1 Staging_2 Staging_3 Staging_4

EDMT 1

EDMT 2

Node 1 10.26.51.35 [hqatg05]

Node 2 10.26.51.36 [hqatg06]

Node 3 10.26.51.37 [hqatg07]

Remote EDMT 3

Storing 1 PB in EDMT & Hadoop

EDMT 1 PB

(1/2 rack)

$450K

10kW ($9K/year)

Storage:

96 TB,

$20,090

$209/TB

8-core

Xeon

Server:

$1,172

16-core

Xeon

Server:

$4,860

CUSTOMER SUCCESS STORIES

EDMT Success Story 1:

Global Telecom & ISP (US)

Challenge Solution

SQL DW: Structured data (CDR, SMS,

Billing)

Step 1: Store 16 B CDR+SMS/day w/

EDMT = 85% of world’s SMS data

Step 2: - enable cross-correlation of CDR

data w/fully indexed text content

Benefit: create new services for +900

Telco carriers

EDMT Success Story 2 :

University Research Clinic and Hospital

Challenge Solution

Email and file archive with text search Step 1: store, search and “retention” all

emails/SMS/IM with collaboration

Step 2: Add patient insurance payment

data and cross-analyze

Benefit: full 360-degree view of patients,

carriers and physicians


Taxation Office of European Country

Challenge Solution

SQL DW: DB consolidation project Step 1: Consolidate 30 years of 10 M

taxpayer SQL records

Step 2: Capture Audit data (emails,

voicemails, faxes etc.) with audited

taxpayers for Audit, Litigation and

Compliance purpose

Benefit: at ZERO cost, TaxOffice gets

360-degree customer view


EU Country Intelligence Agency

Challenge Solution

Email/SMS/IM archive and text search

Step 1: Load+cross-correlate huge

volumes of email/SMS to prevent cyber

crime, online attacks, web fraud, digital

threats. Loads +20TB data per day. Real-

time sub-sec Searches

Step 2: Store financial and travel data

(=SQL) and cross-correlate with emails,

SMS in real time. +1,000 TB (1 PB) in size

Benefit: previously impossible real-time

monitoring and actionable intelligence

COMPARISONS AND

SIZING RULES

Price-per-TB of User data (compressed)

EDMT Solution : Three-year Cost per COMPRESSED TB of User Data < $3,000

Download the entire document from:

ftp://public.dhe.ibm.com/software/data/sw-library/infosphere/analyst-reports/ITG-ISAS-Exadata-Teradata.pdf

2012 : 1 PB + new HW = PB for masses

40-core Linux

½ HALF Of

RACK EMPTY

• Same data capacity and speed as 2007 “1 PB “ 1. 1/15 in physical size, cost, electricity, weight 2. Deploys in 1 week 3. 288 TB of raw storage ( ~$115,000 $400/TB) 4. 40-core Xeon Linux server HP DL980

• Price :

SW + HW = ~$500,000 Amount of data stored = 1,030 TB $/TB of data = ~ $480/TB of data

EDMT 1 PB vs. Hadoop 1 PB (3c-def)

Hadoop 1 PB of data (default config) Hadoop node: 8 TB of data/ node (24 TB raw, w/ 3x copy) Node= 8-core Xeon, 16 GB RAM, 12x 2TB disks, 2RU = $4K HW= 125 nodes (6 racks), 3 PB raw, 1,000 disks = $500,000 Power= 125 kW (incl. A/C) = $109,500/year ($0.1/kWh) ~600 Tons of CO2 per year (=~120 cars )

EDMT 1 PB (1/2 rack) ~$500K

10kW ($9K/year)

50 tons CO2/year (~10 cars)

Hadoop 1 PB EDMT 1 PB

AMAZON Cloud: 288 TB of storage (“PB”)

1. 4 monthly payments to for cloud storage may

pay for 288 TB of EDMT storage – the other 44

months (out of typ. 48 month HW cycle) are free

2. Savings could be significant

EDMT 10 PB vs. Hadoop 10 PB

Hadoop 10 PB 1,200 servers, 12,000 disks, 60 racks, $5M ($4K/node), 1,200KW = $1.1 M /year in electricity (@ $0.10/kWh) ~6,000 Tons of CO2 per year (=~ 1,200 cars )

EDMT 10 PB

EDMT 10 PB (6 racks)

~$2M-$5M 100kW ($90K/year)

500 tons CO2/year

(~100 cars)

Hadoop 10 PB

AMAZON Cloud storage for 10 PB

$300K/month for 3 PB of storage.

We sell 3 PB for $1.2 M

1. 4 monthly payments to for cloud storage may

pay for 3 PB of EDMT storage – the other 44

months (out of typ. 48 month HW cycle) are free

2. Savings could be significant

Hadoop v1

EDMT Million Channel Real Time Ingestor

EDMT: store Hadoop data in EDMT for speed and SQL/ACID

HADOOP

HADOOP

HADOOP

EDMT® vs. Google Search Appliance (GSA)

Dell.com

1. EDMT Solution can handle more data

2. GSA is more expensive “per document” than EDMT®

http://search.dell.com/results.aspx?s=gen&c=us&l=en&cs=&k=gb-7007&cat=all&x=7&y=6

EDMT® “L”

Google GSA

BMMsoft – Services and products

1/17/2013

• Assessment of your current BI and Big Data situation

• Design of “Dream BI” to meet your future BI and Big Data needs

• EDMT Solution (on any DB supported platform)

• 2-hour consultation block

THANK YOU

PAUL KRNETA [email protected]

MERSHARD FRIERSON [email protected]

mailto:[email protected]

mailto:[email protected]

Date post:	11-Mar-2018
Category:	Documents
Upload:	phungtuyen
View:	216 times
Download:	1 times