+ All Categories
Home > Technology > Green Plum IIIT- Allahabad

Green Plum IIIT- Allahabad

Date post: 14-Apr-2017
Category:
Upload: iiit-allahabad
View: 353 times
Download: 1 times
Share this document with a friend
22
Data Computing Division © Copyright 2011 EMC Corpora2on. All rights reserved. 1 Presented By Brijesh Kumar Awasthi IMP2014002
Transcript
Page 1: Green Plum IIIT- Allahabad

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 1

“Presented By

Brijesh Kumar Awasthi

IMP2014002

Page 2: Green Plum IIIT- Allahabad

What is GreenPlum…?

•Greenplum, the company, was founded in September 2003 by Scott Yara and Luke Lonergan.

•It was a merger of two smaller companies Metapa in Los Angeles and Didera in Fairfax, Virginia•Greenplum, based in in San Mateo, California, released its database management system software in April 2005 calling it Bizgres

Page 3: Green Plum IIIT- Allahabad

Data Computing Division

E M C A C Q U I R E S G R E E N P L U M

Greenplum Becomes the Foundation of EMCʼs Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 3

“Greenplum, with expertise in the massively parallel arena, will give the storage giant a boost in big-data computing.”

– InformationWeek –

“For three years, Gartner has identified Greenplum asthe most advanced vendor in the visionary

quadrant of its data warehouse DBMS Magic Quadrant….”– Gartner

Page 4: Green Plum IIIT- Allahabad

What the COO of EMC said about Green Plum And BI

Page 5: Green Plum IIIT- Allahabad

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 5

New Rrealities…New Demands!• Do it faster

– Ingest more data

– Ingest it faste– Keep it unsummarised, keep it for longer

• Be more Responsive– Unpredictable queries, Rapidly evolving bespoke analy2cs– New tools: Hadoop, MapReduce, Hive, HBase, “R”

• Manage new data types– Manage and allow queries across structured, semi- ‐structured and unstructured data

• Do it at a lower cost

Big Data will revolutionizeData Warehousing and analysis.

Page 6: Green Plum IIIT- Allahabad

Data Computing Division

Why Greenplum?

Fast Data

Loading Extreme Performance & Elastic Scalability

Unified Data Access

© Copyright 2011 EMC Corpora2on. All rights reserved. 6

• EMC Greenplum is a shared nothing, massively parallel processing (MPP) data warehouse system

• Core principle of data computing is to move the processing dramatically closer to the data and to the people

Page 7: Green Plum IIIT- Allahabad

Data Computing Division

Segment Servers

Query processing & data storage

... ...

Master Server

Query planning & dispatch

Hadoop MapReduce

Data Sources

Loading, streaming, etc.

Network Interconnect

External Files, URLs, Hadoop (HDFS), WebServices (including from other

DBs), O/S Pipes (including from other DBs)

Standard Business Intelligence and Analy2cal tools

SQLBI tools

Analytical tools

Queries distributed across all available

resources

Shared Nothing, Massively Parallel Processing means

no boS lenecks and linear scalability.

Data loading also takes advantage of MPP architecture

Greenplum handles structured, semi- ‐

structured and unstructured data

Clients see a single database

primary server, plus hot failover

© Copyright 2011 EMC Corpora2on. All rights reserved. 7

Page 8: Green Plum IIIT- Allahabad

Data Computing Division

Why is MPP different?

…Greenplum is a Scale-Out Architecture on standard commodity hardware

MPP

© Copyright 2011 EMC Corpora2on. All rights reserved. 8

• Queries shipped to each node simultaneously

• Execute parallel on each segment instance.• Multiple pipe lines of data• Highly Scalable topology• Locks and buffers not shared.

Traditional• Single database buffer used by all

user operations• More locks, means more complex

lockmanagement system

• Single pipe to data• Limited Scalability

Page 9: Green Plum IIIT- Allahabad

Partitioning: The Key to ParallelismStrategy: Spread data evenly across as many nodes (and disks) as possible

Greenplum Database High Speed Loader

Data Computing Division

© Copyr2ig0h/0t 220/1112EMC Corpora2on. All rights reserved.

6 9

OrderOrder # Order

Date

Customer ID

43 Oct 20 2005 1264 Oct 20 2005 11145 Oct 20 2005 4246 Oct 20 2005 6477 Oct 20 2005 3248 Oct 20 2005 1250 Oct 20 2005 3456 Oct 20 2005 21363 Oct 20 2005 1544 Oct 20 2005 10253 Oct 20 2005 8255 Oct 20 2005 55

Page 10: Green Plum IIIT- Allahabad

Greenplum DatabasePowerful Data Loading Capabilities• Industry leading performance:

– >10TB per hour per rack• Innovative, parallel-everything architecture:

– Scatter-Gather Streaming™ provides true linear scaling– Support for both large-batch and continuous real-time loading

strategies– Enable complex data transformations “in-flight”– Transparent interfaces to loading via support files, application and

services

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 10

Page 11: Green Plum IIIT- Allahabad

Traditional Loading vs Greenplum DB Parallel Loading

Segment nodes

Segment nodes

Segment nodes

Segment nodes

Interconnect

Conventional Loading

ETLServers

Interconnect

ETLServers

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 11

Page 12: Green Plum IIIT- Allahabad

Client

Advanced pipeline process for fast operation

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 12

Sort Request

Master Server

Segment Servers

9 6 102 11 54 3 121 7 8

Page 13: Green Plum IIIT- Allahabad

Advanced pipeline process for fast operation

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 10

Master Server

Segment Servers

Client

1 3 52 6 84 7 109 11 12

Page 14: Green Plum IIIT- Allahabad

Greenplum Database Extreme Performance• Optimized for BI and Analytics

– Rich eco-system of partners

• Provides automatic parallelization– Just load and query like any database– Tables are automatically distributed across

nodes– No need for manual partitioning or tuning

• Extremely scalable MPP shared-nothing Architecture

– All nodes can scan and process in parallel– Linear scalability by adding nodes

Interconnect

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 14

Loading

Page 15: Green Plum IIIT- Allahabad

Platform Independence Delivers Choice and Flexibility

Virtualized Infrastructure• Pool resources• Elastic scalability

Data Computing Appliance• Optimized Price/Performance• Minimum time- ‐to- ‐value• Ideal for Produc@on Environments

Software- ‐Only• On your x86 hardware• Flexibility for any workload

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 15

Page 16: Green Plum IIIT- Allahabad

Table ‘Customer’

Jan ’09 Feb ’09 Mar ’09Apr ’09 May ’09 Jun ’09

Jul ’09 Aug ’09 Sept ’09 Oct ’09 Nov ’09

Column-Oriented Archival Compression

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 16

Column-Oriented Fast Compression

Row-Oriented Fast Compression

Greenplum Polymorphic Data Storage

• Greenplum Databaseʼs engine provides a flexible storage model– Four table types: heap, row-oriented, column-oriented, external– Block compression: Gzip (levels 1-9), QuickLZ

• Storage types can be mixed within a database, and even within a table– Fully configurable via table DDL and partitioning syntax– You may also choose to index some partitions and not others

• Gives customers the choice of processing model for any table or partition– Tables/partitions of different storage types can be joined together without restriction– Highly tuned – e.g. columnar does efficient pre-projection and parallel execution

Page 17: Green Plum IIIT- Allahabad

Unified Data Access Across The Enterprise• Workload Management

– Connection management controls how many users can be connected and assigns them to a queue

– User-based resource queues allow for control of the total number or cost of queries allowed at any point in time.

• Dynamic Query Prioritization– Patent pending technique of dynamically

balancing resources across running queries– Allows DBAs to control query priorities in real-

time, or determine default priorities by resource queue

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 17

Page 18: Green Plum IIIT- Allahabad

Highly interactive web-basedperformance monitoring

Real-time and historic views of:

• Resource utilization

• Queries and query internals

Greenplum Performance Monitor

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 18

Page 19: Green Plum IIIT- Allahabad

Key Technical Requirements for HPA

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 19

Technical Values Performance - Massively parallel Architecture Load speeds – 10TB/hr Integration with SAS In-database analytics using Java, PL/R, etc Integration with many more BI, Analytical tools, Integration with Hadoop for unstructured data analysis

Financial Value Lower Total cost of ownership Best Price/performance Ratio in the industry for

EDW/ analytical appliance Operational Values

No Indices maintenance Backup recovery solution Most robust Disaster Recovery Solution in Industry Best Technical and customer Support Organization

backing

Page 20: Green Plum IIIT- Allahabad

Greenplum Customers -- Government• Pacific Northwest National Labs

(Dept. of Energy) does cyberanalytics.

• Usa spending.gov traces the outlays of the US Federal Government.

• The Federal Reserve Bank of Kansas City does economic analysis mostly related to the housing market.

• Recently, the Internal Revenue Service purchased a DCA to do work related to Fraudulent Tax returns.

• ATO uses GP as an investigatory tool in their Compliance and Audit Logging Unit.

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 12 20

Page 21: Green Plum IIIT- Allahabad

High Performance Analytics

‘The power to know fast’Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 21

Thank you

Page 22: Green Plum IIIT- Allahabad

Questions?

Data Computing Division

© Copyright 2011 EMC Corpora2on. All rights reserved. 22


Recommended