+ All Categories
Home > Education > Data loading and unloading in ibm netezza

Data loading and unloading in ibm netezza

Date post: 12-Aug-2015
Category:
Upload: ravikumar-nandigam
View: 78 times
Download: 7 times
Share this document with a friend
25
IBM PureData System for Analytics (Formerly known as, IBM Netezza) - Ravi
Transcript
Page 1: Data loading and unloading in ibm netezza

IBM PureData Systemfor Analytics

(Formerly known as, IBM Netezza)

- Ravi

Page 2: Data loading and unloading in ibm netezza

Loading and Unloading Tables

Data Loading/Unloading Components:

• External Tables

• nzload command

• Backup and Restore

• nz_migrate utility

*

Page 3: Data loading and unloading in ibm netezza

External Tables

*

Page 4: Data loading and unloading in ibm netezza

In IBM Netezza environment, there are the following types of tables:•System Tables: Stored on the host•User tables: Stored on the disks in storage arrays•External Tables: Stored as flat files on the host or client systems

An External table allows Netezza to treat an external file as database table

An External table has a definition (also called table schema), but the actual data exists outside of Netezza appliance database

Netezza can treat a file on a client system as an external table using the REMOTESOURCE option

You can use INSERT INTO/SELECT FROM on external tables

*

Page 5: Data loading and unloading in ibm netezza

EXTERNAL TABLES:Example1

*

Page 6: Data loading and unloading in ibm netezza

*

Page 7: Data loading and unloading in ibm netezza

External Tables: Loading data through ODBC

*

Page 8: Data loading and unloading in ibm netezza

Managing External Tables

• You can INSERT and DROP an External Table

• You can join an external table with database tables

• You cannot DELETE, TRUNCATE, and UPDATE an External Table

• Not more than 1 External Table in a FROM/WHERE clause in a query or subquery

• No Union operation between External Tables

• Statistics are automatically generated for External Tables

*

Page 9: Data loading and unloading in ibm netezza

External Tables: Unload data

*

Page 10: Data loading and unloading in ibm netezza

Transient External Tables

Transient external tables (TET) provide a way to define an external table that exists only for the duration of a single query

*

Export data using TET:create external table '/tmp/customer.out' USING (DELIM '|') AS select * from customer;

Import data using TET:truncate table customer;

INSERT INTO CUSTOMER SELECT * FROM EXTERNAL '/tmp/customer.out' USING (DELIM '|');

Page 11: Data loading and unloading in ibm netezza

Compress Binary Format External Tables

create external table ext_customer sameas customer USING (DATAOBJECT '/tmp/customer1.out' FORMAT 'internal' COMPRESS true);

*

\d customer Table "CUSTOMER" Attribute | Type | Modifier | Default Value-----------+-----------------------+----------+--------------- CID | SMALLINT | | CNAME | CHARACTER VARYING(30) | | CAGE | BYTEINT | | CADDRESS | CHARACTER VARYING(50) | |Distributed on hash: "CID"

Page 12: Data loading and unloading in ibm netezza

NZLOAD

Page 13: Data loading and unloading in ibm netezza

NZLOAD

The NZLOAD command is a wrapper to the CREATE EXTERNAL TABLE/INSERT INTO commands

NZLOAD allows you to load data from the local host or a remote client

Nzload is command line interface program. You can provide inputs to nzload through command line or through a control file

The nzload command is an ODBC client application that loads data remotely or locally. You can use the nzload command on the Netezza host and on all the supported client platforms.

STATISTICS are generated for load operations

*

Page 14: Data loading and unloading in ibm netezza

How the nzload command works

Sends queries to the host to create an external table definition

Processes command-line load options

Runs the insert/select query to load data

Drops the external table when the load completes

An nzload operation is treated as a single transaction. i.e., all records are loaded with a single transaction ID

If the load fails the records are logically deleted.The storage space allocated for those records should be recovered at some point in time using either nzreclaim/Truncate table(If load is for first time)

Other users can run queries against the tables while they are being loaded. New data is only visible to users when the transaction has been committed

*

Page 15: Data loading and unloading in ibm netezza

NZLOAD important options

nzload accepts many options and arguments, but below are required:•-host <host_name>•-u <username>•-pw <password>•-db <database_name>•-t <table_name>

Commonly used options & arguments:•-df <filename> /* data (inputs rows to be loaded) */•-cf <filename> /* control file name */•-delim <char> /* delimiter. Default is \t */•-nullValue <char> /* default is NULL. You can change this to any 1 to 4 characters */•-maxErrors•-dateDelim•-dateStyle•-allowReplay /* To enable load continuation if the system paused due to a SPU reset or failover*/

nzload -db database_name -t table_name -delim “|” -maxErrors num_errors -df source_file_name

*

Page 16: Data loading and unloading in ibm netezza

NZLOAD Example:1

*

Page 17: Data loading and unloading in ibm netezza

NZLOAD (Example:2)

*

Page 18: Data loading and unloading in ibm netezza

Sample NZLOG/NZBAD files

*

Page 19: Data loading and unloading in ibm netezza

NZLOG file

When nzload is executed a nzlog file is created; It contains messages related to the load

The nzlog file by default is located in your current working directory

The file name format is <table_name>.<database_name>.nzlog

Use the -lf <file_name> option to specify a different nzlog file name

-outputDir <directory> option may be used to specify the directory for the nzlog file

Appends to the log file for every nzload process that loads to the same database table

Periodically delete log files to free disk space

For nzload operations, a return code is also issued as follows:•0 (success)•1 (failed, no records inserted)•2 (Found errors in input but did not exceed maxErrors, load is deemed successful, and records are inserted)

*

Page 20: Data loading and unloading in ibm netezza

NZBAD file

When nzload is executed a nzbad file is created; It contains only rejected records from the load file.

The nzbad file by default is located in your current working directory

The file name format is <table_name>.<database_name>.nzbad

Use the -bf <file_name> option to specify a different nzbad file name

-outputDir <directory> option may be used to specify the directory for the nzbad file

If the file already exists, it is overwritten.

If there are no rejected records the file will be empty (0 bytes)

*

Page 21: Data loading and unloading in ibm netezza

-maxErrors option (NZLOAD)

*

Page 22: Data loading and unloading in ibm netezza

-maxErrors option (contd …)

*

Page 23: Data loading and unloading in ibm netezza

NZLOAD using Control File

*

Page 24: Data loading and unloading in ibm netezza

NZLOAD using FIXED format

So far what we have seen is text delimited loading. However there are cases where it is difficult to define any delimiter.

For example: A column containing lengthy data having alpha numeric characters. In such cases, it will be difficult to use text delimited loading and one has to use Fixed length loading.

*

Page 25: Data loading and unloading in ibm netezza

Questions?


Recommended