Greenplum ® Database 4.1 Administrator Guide P/N: 300-012-428 Rev: A03 The Data Computing Division of EMC
Transcript
1. The Data Computing Division of EMC Greenplum Database 4.1
Administrator Guide P/N: 300-012-428 Rev: A03
2. Copyright 2011 EMC Corporation. All rights reserved. EMC
believes the information in this publication is accurate as of its
publication date. The information is subject to change without
notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC
CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH
RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY
DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE. Use, copying, and distribution of any EMC
software described in this publication requires an applicable
software license. For the most up-to-date listing of EMC product
names, see EMC Corporation Trademarks on EMC.com All other
trademarks used herein are the property of their respective
owners.
3. Greenplum Database Administrator Guide 4.1 - Contents
Greenplum Database Administrator Guide 4.1 - Contents Preface
...............................................................................................
1 About This Guide
..............................................................................
1 Document Conventions
....................................................................
2 Text Conventions
........................................................................
2 Command Syntax Conventions
................................................... 3 Getting
Support
...............................................................................
3 Product information
....................................................................
3 Technical support
.......................................................................
4 Section I: Introduction to Greenplum Chapter 1: About the
Greenplum Architecture ........................ 6 About the
Greenplum Master
............................................................ 7
About the Greenplum
Segments....................................................... 7
About the Greenplum Interconnect
.................................................. 7 About
Redundancy and Failover in Greenplum Database .................. 8
About Segment Mirroring
............................................................ 8
About Master Mirroring
............................................................... 9
About Interconnect Redundancy
................................................. 9 About Parallel
Data Loading
............................................................10
About Management and Monitoring
.................................................10 Chapter 2:
About Distributed Databases .................................12
Understanding How Data is Stored
..................................................12 Understanding
Greenplum Distribution Policies................................13
Chapter 3: Summary of Greenplum Features
.........................14 Greenplum SQL Standard Conformance
..........................................14 Core SQL Conformance
..............................................................14
SQL 1992 Conformance
.............................................................15 SQL
1999 Conformance
.............................................................16 SQL
2003 Conformance
.............................................................16 SQL
2008 Conformance
.............................................................17
Greenplum and PostgreSQL Compatibility
.......................................18 Chapter 4: About
Greenplum Query Processing ....................25 Understanding
Query Planning and Dispatch ...................................25
Understanding Greenplum Query Plans
...........................................26 Understanding
Parallel Query Execution
..........................................27 Section II: Access
Control and Security Chapter 5: Managing Roles and Privileges
..............................30 Security Best Practices for Roles
and Privileges ...............................30 Creating New Roles
(Users)
.............................................................31
Altering Role Attributes
..............................................................31
Creating Groups (Role Membership)
................................................32 Managing Object
Privileges
.............................................................33
Simulating Row and Column Level Access Control
.....................34 Encrypting Data
..............................................................................34
Table of Contents iii
4. Greenplum Database Administrator Guide 4.1 - Contents
Chapter 6: Configuring Client Authentication
.........................36 Allowing Connections to Greenplum
Database .................................36 Editing the
pg_hba.conf
File.......................................................37
Limiting Concurrent Connections
.....................................................38 Encrypting
Client/Server Connections
.............................................39 Chapter 7:
Accessing the Database
...........................................41 Establishing a
Database Session
.....................................................41 Supported
Client Applications
..........................................................42
Greenplum Database Client Applications
....................................43 pgAdmin III for Greenplum
Database ........................................44 Database
Application Interfaces
.................................................47 Third-Party
Client Tools
.............................................................48
Troubleshooting Connection Problems
.............................................49 Chapter 8: Managing
Workload and Resources .....................50 Overview of
Greenplum Workload Management ..............................50 How
Resource Queues Work in Greenplum Database .................50
Steps to Enable Workload Management
.....................................54 Configuring Workload
Management .................................................55
Creating Resource Queues
..............................................................56
Creating Queues with an Active Query Limit
..............................56 Creating Queues with Memory Limits
.........................................57 Creating Queues with a
Query Planner Cost Limits ....................57 Setting Priority
Levels
................................................................58
Assigning Roles (Users) to a Resource
Queue..................................59 Removing a Role from a
Resource Queue ..................................59 Modifying
Resource
Queues.............................................................60
Altering a Resource
Queue.........................................................60
Dropping a Resource Queue
......................................................60 Checking
Resource Queue Status
....................................................60 Viewing
Queued Statements and Resource Queue Status ..........61 Viewing
Resource Queue Statistics
............................................61 Viewing the Roles
Assigned to a Resource Queue ......................61 Viewing the
Waiting Queries for a Resource Queue ....................62
Clearing a Waiting Statement From a Resource Queue ..............62
Viewing the Priority of Active Statements
..................................63 Resetting the Priority of an
Active Statement.............................63 Section III:
Database Administration Chapter 9: Defining Database
Objects......................................65 Creating and
Managing Databases
..................................................65 About Template
Databases
........................................................65 Creating
a Database
..................................................................65
Viewing the List of Databases
....................................................66 Altering a
Database
...................................................................66
Dropping a Database
.................................................................66
Creating and Managing Tablespaces
................................................67 Creating a
Filespace...................................................................67
Creating a Tablespace
...............................................................68
Table of Contents iv
5. Greenplum Database Administrator Guide 4.1 - Contents Using
a Tablespace to Store Database Objects ..........................68
Viewing Existing Tablespaces and Filespaces
.............................69 Dropping Tablespaces and Filespaces
........................................69 Creating and Managing
Schemas.....................................................69 The
Default Public Schema
.......................................................70 Creating
a Schema
....................................................................70
Schema Search Paths
................................................................70
Dropping a Schema
...................................................................71
System Schemas
.......................................................................71
Creating and Managing Tables
........................................................72 Creating
a Table
........................................................................72
Altering a Table
.........................................................................79
Dropping a Table
.......................................................................80
Partitioning Large Tables
.................................................................80
Understanding Table Partitioning in Greenplum Database ..........80
Deciding on a Table Partitioning Strategy
..................................81 Creating Partitioned Tables
........................................................82 Loading
Partitioned Tables
.........................................................86
Verifying Your Partition Strategy
................................................86 Viewing Your
Partition Design
....................................................87 Maintaining
Partitioned Tables
...................................................87 Creating and
Using Sequences
........................................................91 Creating
a
Sequence..................................................................91
Using a Sequence
......................................................................91
Altering a Sequence
...................................................................92
Dropping a
Sequence.................................................................92
Using Indexes in Greenplum Database
............................................92 Index Types
...............................................................................94
Creating an Index
......................................................................96
Examining Index Usage
.............................................................96
Managing Indexes
.....................................................................97
Dropping an Index
.....................................................................97
Creating and Managing
Views..........................................................97
Creating Views
...........................................................................97
Dropping
Views..........................................................................97
Chapter 10: Managing Data
.........................................................99 About
Concurrency Control in Greenplum Database
........................99 Inserting New Rows
......................................................................
100 Updating Existing Rows
.................................................................
101 Deleting Rows
...............................................................................
101 Truncating a Table
...................................................................
102 Working With Transactions
............................................................ 102
Transaction Isolation Levels
..................................................... 102 Vacuuming
the Database
.............................................................. 103
Configuring the Free Space Map
.............................................. 103 Chapter 11:
Querying Data
........................................................ 105
Defining Queries
...........................................................................
105 SQL Lexicon
.............................................................................
105 Table of Contents v
6. Greenplum Database Administrator Guide 4.1 - Contents SQL
Value Expressions
............................................................ 105
Using Functions and Operators
...................................................... 114 Using
Functions in Greenplum Database ..................................
114 User-Defined Functions
............................................................ 115
Built-in Functions and Operators
.............................................. 115 Query Profiling
..............................................................................
130 Reading EXPLAIN Output
......................................................... 131
Reading EXPLAIN ANALYZE Output
.......................................... 132 What to Look for in
a Query Plan ............................................. 133
Chapter 12: Loading and Unloading Data
.............................. 135 Greenplum Database Loading Tools
Overview ............................... 135 About External Tables
.............................................................. 135
About gpload
...........................................................................
136 About COPY
..............................................................................
136 Loading Data into Greenplum Database
........................................ 136 Accessing File-Based
External Tables ....................................... 137
Defining External Tables - Examples
............................................. 139 Using the
Greenplum Parallel File Server (gpfdist) ................... 141
Using Hadoop Distributed File System (HDFS) Tables ..............
144 Creating and Using Web External Tables
.................................. 146 Loading Data Using an
External Table ...................................... 148 Handling
Load Errors
............................................................... 148
Loading Data from Greenplum Database
....................................... 150 Loading Data with
gpload ........................................................ 150
Loading Data with the gphdfs Protocol
..................................... 151 Loading Data with COPY
.......................................................... 152 Data
Loading Performance Tips
............................................... 152 Unloading Data
from Greenplum Database .................................... 153
Defining a File-Based Writable External Table
.......................... 153 Defining a Command-Based Writable
External Web Table ........ 155 Unloading Data Using a Writable
External Table ...................... 156 Unloading Data Using COPY
..................................................... 157 Readable
External Tables and Query Planner Statistics ............ 157
Formatting Data Files
....................................................................
157 Formatting
Rows......................................................................
157 Formatting Columns
................................................................
158 Representing NULL Values
....................................................... 158
Escaping
..................................................................................
158 Character
Encoding..................................................................
160 Section IV: System Administration Chapter 13: Starting and
Stopping Greenplum .................... 162 Overview
.......................................................................................
162 Starting Greenplum Database
....................................................... 162
Restarting Greenplum Database
.............................................. 162 Uploading
Configuration File Changes Only .............................. 163
Starting the Master in Maintenance Mode
................................ 163 Stopping Greenplum Database
...................................................... 163 Table of
Contents vi
7. Greenplum Database Administrator Guide 4.1 - Contents
Chapter 14: Configuring Your Greenplum System .............. 165
About Greenplum Master and Local Parameters
............................ 165 Setting Configuration Parameters
.................................................. 165 Setting a
Local Configuration Parameter ..................................
166 Setting a Master Configuration Parameter
............................... 166 Viewing Settings of Server
Configuration Parameters ................... 167 Configuration
Parameter Categories ..............................................
167 Connection and Authentication Parameters
.............................. 168 System Resource Consumption
Parameters ............................. 169 Query Tuning
Parameters ........................................................
170 Error Reporting and Logging Parameters
................................. 172 System Monitoring Parameters
................................................ 172 Runtime
Statistics Collection Parameters .................................
173 Automatic Statistics Collection Parameters
.............................. 173 Client Connection Default
Parameters ...................................... 174 Lock
Management Parameters
................................................. 174 Workoad
Management Parameters ..........................................
174 External Table Parameters
....................................................... 175
Append-Only Table Parameters
................................................ 175 Database and
Tablespace/Filespace Parameters ...................... 175 Past
PostgreSQL Version Compatibility Parameters .................. 175
Greenplum Array Configuration Parameters
............................. 175 Chapter 15: Enabling High
Availability Features ................. 177 Overview of High
Availability in Greenplum Database.................... 177 Overview
of Segment Mirroring
............................................... 177 Overview of
Master Mirroring
................................................... 178 Overview of
Fault Detection and Recovery ............................... 179
Enabling Mirroring in Greenplum Database
.................................... 180 Enabling Segment Mirroring
..................................................... 180 Enabling
Master Mirroring
........................................................ 181
Knowing When a Segment is Down
............................................... 182 Enabling Alerts
and Notifications ..............................................
182 Checking for Failed Segments
.................................................. 182 Checking the
Log Files
............................................................. 183
Recovering a Failed Segment
........................................................ 183
Recovering From Segment Failures
.......................................... 184 Recovering a Failed
Master............................................................
187 Restoring Master Mirroring After a Recovery
............................ 188 Chapter 16: Backing Up and
Restoring Databases .............. 190 Overview of Backup and
Restore Operations ................................. 190 About
Parallel Backups
............................................................ 190
About Non-Parallel Backups
..................................................... 191 About
Parallel Restores
............................................................ 191
About Non-Parallel Restores
.................................................... 192 Backing Up
a Database
.................................................................
192 Backing Up a Database with gp_dump
..................................... 193 Automating Parallel
Backups with gpcrondump ........................ 194 Restoring From
Parallel Backup Files .............................................
195 Table of Contents vii
8. Greenplum Database Administrator Guide 4.1 - Contents
Restoring a Database with gp_restore
..................................... 195 Restoring a Database
Using gpdbrestore ................................. 197 Restoring
to a Different Greenplum System Configuration ....... 197 Chapter
17: Expanding a Greenplum System ....................... 199
Planning Greenplum System Expansion
......................................... 199 System Expansion
Overview .................................................... 199
System Expansion Checklist
.................................................... 201 Planning
New Hardware Platforms ...........................................
202 Planning Initialization of New Segments
.................................. 202 Planning Table
Redistribution ...................................................
203 Preparing and Adding Nodes
......................................................... 206
Adding New Nodes to the Trusted Host Environment ...............
206 Verifying OS Settings
............................................................... 208
Validating Disk I/O and Memory Bandwidth
............................. 208 Integrating New Hardware into the
System ............................. 209 Initializing New Segments
............................................................. 209
Creating an Input File for System Expansion
........................... 209 Running gpexpand to Initialize New
Segments ........................ 212 Rolling Back an Failed
Expansion Setup ................................... 213
Redistributing Tables
.....................................................................
213 Ranking Tables for Redistribution
............................................ 213 Redistributing
Tables Using gpexpand...................................... 214
Monitoring Table
Redistribution................................................ 214
Removing the Expansion Schema
.................................................. 215 Chapter 18:
Monitoring a Greenplum System....................... 216 Monitoring
Database Activity and Performance ..............................
216 Monitoring System State
............................................................... 216
Enabling System Alerts and Notifications
................................. 217 Checking System State
............................................................ 223
Checking Disk Space Usage
..................................................... 224 Checking
for Data Distribution Skew ........................................
225 Viewing Metadata Information about Database Objects ...........
226 Viewing the Database Server Log Files
.......................................... 227 Log File Format
........................................................................
227 Searching the Greenplum Database Server Log Files
............... 228 Using gp_toolkit
............................................................................
228 Chapter 19: Routine System Maintenance Tasks.................
230 Routine Vacuum and Analyze
........................................................ 230
Transaction ID Management
.................................................... 230 System
Catalog Maintenance
................................................... 230 Vacuum and
Analyze for Query Optimization ........................... 231
Routine Reindexing
.......................................................................
231 Managing Greenplum Database Log Files
...................................... 232 Database Server Log
Files ....................................................... 232
Management Utility Log Files
................................................... 232 Table of
Contents viii
9. Greenplum Database Administrator Guide 4.1 - Contents
Section V: Performance Tuning Chapter 20: Defining Database
Performance ....................... 234 Understanding the
Performance Factors ........................................ 234
System Resources
...................................................................
234 Workload
.................................................................................
234 Throughput
..............................................................................
234 Contention
...............................................................................
235 Optimization
............................................................................
235 Determining Acceptable Performance
............................................ 235 Baseline Hardware
Performance .............................................. 235
Performance Benchmarks
........................................................ 235
Chapter 21: Common Causes of Performance Issues......... 237
Identifying Hardware and Segment Failures
.................................. 237 Managing Workload
.......................................................................
238 Avoiding Contention
......................................................................
238 Maintaining Database Statistics
..................................................... 238
Identifying Statistics Problems in Query Plans
......................... 238 Tuning Statistics Collection
...................................................... 239
Optimizing Data Distribution
......................................................... 239
Optimizing Your Database
Design.................................................. 239
Greenplum Database Maximum Limits
..................................... 240 Chapter 22: Investigating
a Performance Problem ............ 241 Checking System State
.................................................................
241 Checking Database Activity
........................................................... 241
Checking for Active Sessions (Workload)
................................. 241 Checking for Locks
(Contention) .............................................. 241
Checking Query Status and System Utilization
......................... 242 Troubleshooting Problem Queries
.................................................. 242
Investigating Error Messages
........................................................ 242
Gathering Information for Greenplum Support
......................... 243 Section VI: Extending Greenplum
Database Chapter 23: Using Greenplum MapReduce
............................ 245 About Greenplum MapReduce
....................................................... 245 The
Basics of MapReduce
......................................................... 245 How
Greenplum MapReduce Works
.......................................... 246 Programming
Greenplum MapReduce ............................................
247 Defining Inputs
........................................................................
247 Defining Map Functions
............................................................ 250
Defining Reduce Functions
....................................................... 252
Defining
Outputs......................................................................
255 Defining Tasks
.........................................................................
256 Putting Together a Complete MapReduce Specification
............ 257 Submitting MapReduce Jobs for Execution
.................................... 257 Troubleshooting Problems
with MapReduce Jobs ........................... 258 Language Does
Not Exist .........................................................
258 Generic Python Iterator Error
.................................................. 259 Function
Defined Using Wrong MODE .......................................
259 Table of Contents ix
10. Greenplum Database Administrator Guide 4.1 - Contents
Section VII: References Appendix A: SQL Command
Reference....................................... 264 SQL Syntax
Summary
...................................................................
266 ABORT
..........................................................................................
293 ALTER AGGREGATE
.......................................................................
294 ALTER CONVERSION
.....................................................................
296 ALTER
DATABASE..........................................................................
297 ALTER DOMAIN
.............................................................................
299 ALTER EXTERNAL TABLE
............................................................... 301
ALTER FILESPACE
.........................................................................
304 ALTER FUNCTION
..........................................................................
305 ALTER GROUP
...............................................................................
308 ALTER INDEX
................................................................................
309 ALTER LANGUAGE
.........................................................................
311 ALTER OPERATOR
.........................................................................
312 ALTER OPERATOR CLASS
.............................................................. 313
ALTER RESOURCE QUEUE
............................................................. 314
ALTER ROLE
..................................................................................
317 ALTER SCHEMA
.............................................................................
321 ALTER SEQUENCE
.........................................................................
322 ALTER TABLE
................................................................................
325 ALTER TABLESPACE
......................................................................
337 ALTER TRIGGER
............................................................................
338 ALTER TYPE
...................................................................................
339 ALTER USER
..................................................................................
340 ANALYZE
.......................................................................................
341 BEGIN
...........................................................................................
343 CHECKPOINT
.................................................................................
345 CLOSE
...........................................................................................
346 CLUSTER
.......................................................................................
347 COMMENT
.....................................................................................
350 COMMIT
........................................................................................
353 COPY
.............................................................................................
354 CREATE AGGREGATE
.....................................................................
362 CREATE CAST
................................................................................
366 CREATE CONVERSION
...................................................................
369 CREATE DATABASE
.......................................................................
371 CREATE DOMAIN
...........................................................................
373 CREATE EXTERNAL TABLE
............................................................. 375
CREATE
FUNCTION........................................................................
383 CREATE GROUP
.............................................................................
389 CREATE INDEX
..............................................................................
390 CREATE LANGUAGE
.......................................................................
394 CREATE OPERATOR
.......................................................................
398 CREATE OPERATOR CLASS
............................................................ 403
CREATE RESOURCE QUEUE
........................................................... 408
CREATE ROLE
................................................................................
412 CREATE RULE
................................................................................
417 Table of Contents x
11. Greenplum Database Administrator Guide 4.1 - Contents
CREATE SCHEMA
...........................................................................
420 CREATE SEQUENCE
.......................................................................
422 CREATE TABLE
..............................................................................
426 CREATE TABLE AS
.........................................................................
437 CREATE TABLESPACE
....................................................................
441 CREATE TRIGGER
..........................................................................
443 CREATE TYPE
................................................................................
446 CREATE USER
...............................................................................
453 CREATE VIEW
...............................................................................
454 DEALLOCATE
.................................................................................
457 DECLARE
.......................................................................................
458 DELETE
.........................................................................................
461 DROP AGGREGATE
........................................................................
464 DROP CAST
...................................................................................
465 DROP CONVERSION
......................................................................
466 DROP
DATABASE...........................................................................
467 DROP DOMAIN
..............................................................................
468 DROP EXTERNAL TABLE
................................................................
469 DROP FILESPACE
..........................................................................
470 DROP FUNCTION
...........................................................................
471 DROP GROUP
................................................................................
473 DROP INDEX
.................................................................................
474 DROP LANGUAGE
..........................................................................
475 DROP OPERATOR
..........................................................................
476 DROP OPERATOR CLASS
............................................................... 478
DROP OWNED
...............................................................................
480 DROP RESOURCE QUEUE
.............................................................. 482
DROP ROLE
...................................................................................
484 DROP RULE
...................................................................................
485 DROP SCHEMA
..............................................................................
486 DROP SEQUENCE
..........................................................................
487 DROP TABLE
.................................................................................
488 DROP TABLESPACE
.......................................................................
489 DROP TRIGGER
.............................................................................
490 DROP TYPE
....................................................................................
491 DROP USER
...................................................................................
492 DROP VIEW
...................................................................................
493 END
..............................................................................................
494 EXECUTE
.......................................................................................
495 EXPLAIN
........................................................................................
496 FETCH
...........................................................................................
499 GRANT
..........................................................................................
503 INSERT
.........................................................................................
508 LOAD
............................................................................................
510 LOCK
.............................................................................................
511 MOVE
............................................................................................
515 PREPARE
.......................................................................................
517 REASSIGN OWNED
........................................................................
520 REINDEX
.......................................................................................
521 Table of Contents xi
20. Greenplum Database Administrator Guide 4.1 - Contents
Viewing Greenplum Database Server Log Files
.............................. 953 gp_log_command_timings
....................................................... 953
gp_log_database
.....................................................................
954 gp_log_master_concise
........................................................... 955
gp_log_system
........................................................................
955 Checking Server Configuration Files
.............................................. 956
gp_param_setting('parameter_name').....................................
957 gp_param_settings_seg_value_diffs
........................................ 957 Checking for Failed
Segments .......................................................
957 gp_pgdatabase_invalid
............................................................ 957
Checking Resource Queue Activity and Status
............................... 958 gp_resq_activity
......................................................................
958 gp_resq_activity_by_queue
..................................................... 959
gp_resq_priority_statement.....................................................
959 gp_resq_role
...........................................................................
959 gp_resqueue_status
................................................................
960 Viewing Users and Groups
(Roles)................................................. 960
gp_roles_assigned
...................................................................
961 Checking Database Object Sizes and Disk Space
........................... 961 gp_size_of_all_table_indexes
.................................................. 962
gp_size_of_database
............................................................... 962
gp_size_of_index
.....................................................................
962 gp_size_of_partition_and_indexes_disk
................................... 963 gp_size_of_schema_disk
......................................................... 963
gp_size_of_table_and_indexes_disk
........................................ 963
gp_size_of_table_and_indexes_licensing
................................. 964 gp_size_of_table_disk
............................................................. 964
gp_size_of_table_uncompressed
............................................. 964 gp_disk_free
............................................................................
965 Checking for Uneven Data Distribution
.......................................... 965 gp_skew_coefficients
............................................................... 965
gp_skew_idle_fractions
........................................................... 966
Appendix J: Oracle Compatibility Functions
.............................. 967 Installing Oracle Compatibility
Functions ....................................... 967 Oracle and
Greenplum Implementation Differences ....................... 967
Available Oracle Compatibility Functions
....................................... 968 decode
..........................................................................................
969 nvl
................................................................................................
972 Appendix K: Character Set Support
............................................. 973 Setting the
Character Set
.............................................................. 974
Character Set Conversion Between Server and
Client.................... 975 Appendix L: SQL 2008 Optional
Feature Compliance................ 978 Glossary
..........................................................................................
999 Index
.............................................................................................
1008 Table of Contents xx
21. Greenplum Database Administrator Guide 4.1 Preface Preface
This guide provides information for system administrators and
database superusers responsible for administering a Greenplum
Database system. About This Guide Document Conventions Getting
Support About This Guide This guide provides information and
instructions for configuring, maintaining and using a Greenplum
Database system. This guide is intended for system and database
administrators responsible for managing a Greenplum Database
system. This guide assumes knowledge of Linux/UNIX system
administration, database management systems, database
administration, and structured query language (SQL). Because
Greenplum Database is based on PostgreSQL 8.2.15, this guide
assumes some familiarity with PostgreSQL. Links and
cross-references to PostgreSQL documentation are provided
throughout this guide for features that are similar to those in
Greenplum Database. This guide contains the following main
sections: Section I, Introduction to Greenplum explains the
distributed architecture and parallel processing concepts of
Greenplum Database. Section II, Access Control and Security
explains how clients connect to a Greenplum Database system, and
how to configure access control and workload management. Section
III, Database Administration explains how to do basic database
administration tasks such as defining database objects, loading
data, writing queries and managing data. Section IV, System
Administration explains the various system administration tasks of
Greenplum Database such as configuring the server, monitoring
system activity, enabling high-availability, backing up and
restoring databases, and other routine system administration tasks.
Section V, Performance Tuning provides guidance on identifying and
troubleshooting the most common causes of performance issues in
Greenplum Database. Section VI, Extending Greenplum Database
describes how to extend the functionality of Greenplum Database by
developing your own functions and programs. Section VII, References
contains reference documentation for SQL commands, command-line
utilities, client programs, system catalogs, and configuration
parameters. About This Guide 1
22. Greenplum Database Administrator Guide 4.1 Preface Document
Conventions The following conventions are used throughout the
Greenplum Database documentation to help you identify certain types
of information. Text Conventions Command Syntax Conventions Text
Conventions Table 0.1 Text Conventions Text Convention Usage
Examples bold Button, menu, tab, page, and field names in GUI
applications Click Cancel to exit the page without saving your
changes. italics New terms where they are defined The master
instance is the postgres process that accepts client connections.
Database objects, such as schema, table, or columns names Catalog
information for Greenplum Database resides in the pg_catalog
schema. File names and path names Edit the postgresql.conf file.
Programs and executables monospace Use gpstart to start Greenplum
Database. Command names and syntax Parameter names monospace
italics Variable information within file paths and file names
Variable information within command syntax monospace bold
/home/gpadmin/config_file COPY tablename FROM 'filename' Used to
call attention to a particular Change the host name, port, and part
of a command, parameter, or database name in the JDBC code snippet.
connection URL: jdbc:postgresql://host:5432/m ydb UPPERCASE
Environment variables SQL commands Keyboard keys Document
Conventions Make sure that the Java /bin directory is in your
$PATH. SELECT * FROM my_table; Press CTRL+C to escape. 2
23. Greenplum Database Administrator Guide 4.1 Preface Command
Syntax Conventions Table 0.2 Command Syntax Conventions Text
Convention Usage Examples { } Within command syntax, curly braces
group related command options. Do not type the curly braces. FROM {
'filename' | STDIN } [ ] Within command syntax, square brackets
denote optional arguments. Do not type the brackets. TRUNCATE [
TABLE ] name ... Within command syntax, an ellipsis DROP TABLE name
[, ...] denotes repetition of a command, variable, or option. Do
not type the ellipsis. | Within command syntax, the pipe symbol
denotes an OR relationship. Do not type the pipe symbol. VACUUM [
FULL | FREEZE ] $ system_command Denotes a command prompt - do not
type the prompt symbol. $ and # denote terminal command prompts.
=> and =# denote Greenplum Database interactive program command
prompts (psql or gpssh, for example). $ createdb mydatabase #
root_system_command => gpdb_command =# su_gpdb_command # chown
gpadmin -R /datadir => SELECT * FROM mytable; =# SELECT * FROM
pg_database; Getting Support EMC support, product, and licensing
information can be obtained as follows. Product information For
documentation, release notes, software updates, or for information
about EMC products, licensing, and service, go to the EMC Powerlink
website (registration required) at: http://Powerlink.EMC.com
Getting Support 3
24. Greenplum Database Administrator Guide 4.1 Preface
Technical support For technical support, go to Powerlink and choose
Support. On the Support page, you will see several options,
including one for making a service request. Note that to open a
service request, you must have a valid support agreement. Please
contact your EMC sales representative for details about obtaining a
valid support agreement or with questions about your account.
Getting Support 4
25. Section I: Introduction to Greenplum Greenplum Database is
a massively parallel processing (MPP) database server based on
PostgreSQL open-source technology. MPP (also known as a shared
nothing architecture) refers to systems with two or more processors
which cooperate to carry out an operation - each processor with its
own memory, operating system and disks. Greenplum leverages this
high-performance system architecture to distribute the load of
multi-terabyte data warehouses, and is able to use all of a systems
resources in parallel to process a query. Greenplum Database is
essentially several PostgreSQL database instances acting together
as one cohesive database management system. It is based on
PostgreSQL 8.2.15, and in most cases is very similar to PostgreSQL
with regards to SQL support, features, configuration options, and
end-user functionality. Database users interact with Greenplum
Database as they would a regular PostgreSQL DBMS. The internals of
PostgreSQL have been modified or supplemented to support the
parallel structure of Greenplum Database. For example the system
catalog, query planner, optimizer, query executor, and transaction
manager components have been modified and enhanced to be able to
execute queries in parallel across all of the PostgreSQL database
instances at once. The Greenplum interconnect (the networking
layer) enables communication between the distinct PostgreSQL
instances and allows the system to behave as one logical database.
Greenplum Database also includes features designed to optimize
PostgreSQL for business intelligence (BI) workloads. For example,
Greenplum has added parallel data loading (external tables),
resource management, query optimizations and storage enhancements
which are not found in regular PostgreSQL. Many features and
optimizations developed by Greenplum do make their way back into
the PostgreSQL community. For example, table partitioning is a
feature developed by Greenplum which is now in standard PostgreSQL.
To learn more about Greenplum Database, refer to the following
topics: About the Greenplum Architecture About Distributed
Databases About Greenplum Query Processing Summary of Greenplum
Features Section I 5
26. Greenplum Database Administrator Guide 4.1 Chapter 1: About
the Greenplum Architecture 1. About the Greenplum Architecture
Greenplum Database is able to handle the storage and processing of
large amounts of data by distributing the load across several
servers or hosts. A database in Greenplum is actually an array of
individual PostgreSQL databases, all working together to present a
single database image. The master is the entry point to the
Greenplum Database system. It is the database instance where
clients connect and submit SQL statements. The master coordinates
the work with the other database instances in the system, the
segments, which handle data processing and storage. Figure 1.1
High-Level Greenplum Database Architecture This section describes
all of the components that comprise a Greenplum Database system,
and how they work together: About the Greenplum Master About the
Greenplum Segments About the Greenplum Interconnect About
Redundancy and Failover in Greenplum Database About Parallel Data
Loading About Management and Monitoring 6
27. Greenplum Database Administrator Guide 4.1 Chapter 1: About
the Greenplum