Data loading and unloading in ibm netezza

Post on 12-Aug-2015

78 views 7 download

transcript

IBM PureData Systemfor Analytics

(Formerly known as, IBM Netezza)

- Ravi

Loading and Unloading Tables

Data Loading/Unloading Components:

• External Tables

• nzload command

• Backup and Restore

• nz_migrate utility

*

External Tables

*

In IBM Netezza environment, there are the following types of tables:•System Tables: Stored on the host•User tables: Stored on the disks in storage arrays•External Tables: Stored as flat files on the host or client systems

An External table allows Netezza to treat an external file as database table

An External table has a definition (also called table schema), but the actual data exists outside of Netezza appliance database

Netezza can treat a file on a client system as an external table using the REMOTESOURCE option

You can use INSERT INTO/SELECT FROM on external tables

*

EXTERNAL TABLES:Example1

*

*

External Tables: Loading data through ODBC

*

Managing External Tables

• You can INSERT and DROP an External Table

• You can join an external table with database tables

• You cannot DELETE, TRUNCATE, and UPDATE an External Table

• Not more than 1 External Table in a FROM/WHERE clause in a query or subquery

• No Union operation between External Tables

• Statistics are automatically generated for External Tables

*

External Tables: Unload data

*

Transient External Tables

Transient external tables (TET) provide a way to define an external table that exists only for the duration of a single query

*

Export data using TET:create external table '/tmp/customer.out' USING (DELIM '|') AS select * from customer;

Import data using TET:truncate table customer;

INSERT INTO CUSTOMER SELECT * FROM EXTERNAL '/tmp/customer.out' USING (DELIM '|');

Compress Binary Format External Tables

create external table ext_customer sameas customer USING (DATAOBJECT '/tmp/customer1.out' FORMAT 'internal' COMPRESS true);

*

\d customer Table "CUSTOMER" Attribute | Type | Modifier | Default Value-----------+-----------------------+----------+--------------- CID | SMALLINT | | CNAME | CHARACTER VARYING(30) | | CAGE | BYTEINT | | CADDRESS | CHARACTER VARYING(50) | |Distributed on hash: "CID"

NZLOAD

NZLOAD

The NZLOAD command is a wrapper to the CREATE EXTERNAL TABLE/INSERT INTO commands

NZLOAD allows you to load data from the local host or a remote client

Nzload is command line interface program. You can provide inputs to nzload through command line or through a control file

The nzload command is an ODBC client application that loads data remotely or locally. You can use the nzload command on the Netezza host and on all the supported client platforms.

STATISTICS are generated for load operations

*

How the nzload command works

Sends queries to the host to create an external table definition

Processes command-line load options

Runs the insert/select query to load data

Drops the external table when the load completes

An nzload operation is treated as a single transaction. i.e., all records are loaded with a single transaction ID

If the load fails the records are logically deleted.The storage space allocated for those records should be recovered at some point in time using either nzreclaim/Truncate table(If load is for first time)

Other users can run queries against the tables while they are being loaded. New data is only visible to users when the transaction has been committed

*

NZLOAD important options

nzload accepts many options and arguments, but below are required:•-host <host_name>•-u <username>•-pw <password>•-db <database_name>•-t <table_name>

Commonly used options & arguments:•-df <filename> /* data (inputs rows to be loaded) */•-cf <filename> /* control file name */•-delim <char> /* delimiter. Default is \t */•-nullValue <char> /* default is NULL. You can change this to any 1 to 4 characters */•-maxErrors•-dateDelim•-dateStyle•-allowReplay /* To enable load continuation if the system paused due to a SPU reset or failover*/

nzload -db database_name -t table_name -delim “|” -maxErrors num_errors -df source_file_name

*

NZLOAD Example:1

*

NZLOAD (Example:2)

*

Sample NZLOG/NZBAD files

*

NZLOG file

When nzload is executed a nzlog file is created; It contains messages related to the load

The nzlog file by default is located in your current working directory

The file name format is <table_name>.<database_name>.nzlog

Use the -lf <file_name> option to specify a different nzlog file name

-outputDir <directory> option may be used to specify the directory for the nzlog file

Appends to the log file for every nzload process that loads to the same database table

Periodically delete log files to free disk space

For nzload operations, a return code is also issued as follows:•0 (success)•1 (failed, no records inserted)•2 (Found errors in input but did not exceed maxErrors, load is deemed successful, and records are inserted)

*

NZBAD file

When nzload is executed a nzbad file is created; It contains only rejected records from the load file.

The nzbad file by default is located in your current working directory

The file name format is <table_name>.<database_name>.nzbad

Use the -bf <file_name> option to specify a different nzbad file name

-outputDir <directory> option may be used to specify the directory for the nzbad file

If the file already exists, it is overwritten.

If there are no rejected records the file will be empty (0 bytes)

*

-maxErrors option (NZLOAD)

*

-maxErrors option (contd …)

*

NZLOAD using Control File

*

NZLOAD using FIXED format

So far what we have seen is text delimited loading. However there are cases where it is difficult to define any delimiter.

For example: A column containing lengthy data having alpha numeric characters. In such cases, it will be difficult to use text delimited loading and one has to use Fixed length loading.

*

Questions?