+ All Categories
Home > Documents > Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a...

Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a...

Date post: 02-Jun-2020
Category:
Upload: others
View: 18 times
Download: 0 times
Share this document with a friend
56
Sqoop In Action LecturerAlex Wang QQ532500648 QQ Communication Group286081824
Transcript
Page 1: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Sqoop In ActionLecturer:Alex WangQQ:532500648QQ Communication Group:286081824

Page 2: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Aganda

• Setup the sqoop environment• Import data • Incremental import• Free-Form Query Import• Export data• Sqoop and Hive

Page 3: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Apache sqoop link page

• http://sqoop.apache.org/• http://sqoop.apache.org/docs/1.4.6/index.html

Page 4: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Introduction

• Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

• Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.

Page 5: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Apache Sqoop-1 Architecture

Page 6: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Apache Sqoop-2 Architecture

Page 7: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Prerequisites

• The following prerequisite knowledge is required for this product:

• Basic computer technology and terminology• Familiarity with command-line interfaces such as bash• Relational database management systems• Basic familiarity with the purpose and operation of Hadoop

Page 8: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Setup sqoop environment

• Download the sqoop tar and uncompress.

• Config the environmentsexport SQOOP_HOME=/usr/local/sqoop-1.4.3.bin__hadoop-0.20

export PATH=$SQOOP_HOME/bin:$PATH

Page 9: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Download the database connectors

Page 10: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Introduce the sqoop command

Page 11: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Prepare for the mysql

• Install the mysql-server•Create a database(sqoop) for test•Create two tables

Page 12: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--Transferring an Entire Table

• sqoop import \• --connect jdbc:mysql://master:3306/sqoop \• --username username \• --password password \• --table cities

Page 13: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--Specifying a Target Directory

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --target-dir /etl/input/cities

Page 14: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--use --warehousedir

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --warehouse-dir /etl/input/

Page 15: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--Importing Only a Subset of Data

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --target-dir /alex/input/subset/cities \• --where "country = 'USA'"

Page 16: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Protecting Your Password

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --table cities \• -P

Page 17: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Protecting Your Password

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --table cities \• --password-file my-sqoop-password

• echo "my-secret-password" > sqoop.password• hadoop dfs -put sqoop.password

/user/$USER/sqoop.password• hadoop dfs -chown 400 /user/$USER/sqoop.password

Page 18: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import --Using a File Format Other Than CSV

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --as-sequencefile

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --as-avrodatafile

Page 19: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--Compressing Imported Data

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --table cities \• --compress

• --compression-codec org.apache.hadoop.io.compress.BZip2Codec

Page 20: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--Speeding Up Transfers

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --table cities \• --direct

Page 21: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--Overriding Type Mapping

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --table cities \• --map-column-java id=Long

Page 22: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

import--Controlling Parallelism

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --num-mappers 10

Page 23: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--Encoding NULL Values

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --null-string '\\N' \• --null-non-string '\\N'

Page 24: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Import--Importing All Your Tables

• sqoop import-all-tables \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop

• sqoop import-all-tables \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --exclude-tables cities,countries

Page 25: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Incremental Import

• So far we’ve covered use cases where you had to transfer an entire table’s contents from the database into Hadoop as a one-time operation. What if you need to keep the imported data on Hadoop in sync with the source table on the relational database side? While you could obtain a fresh copy every day by reimporting all data, that would not be optimal. The amount of time needed to import the data would increase in proportion to the amount of additional data appended to the table daily. This would put an unnecessary performance burden on your database. Why reimport data that has already been imported? For transferring deltas of data, Sqoop offers the ability to do incremental imports.

Page 26: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Importing Only New Data

• Incremental import in append mode will allow you to transfer only the newly created rows. This saves a considerable amount of resources compared with doing a full import every time you need the data to be in sync. One downside is the need to know the value of the last imported row so that next time Sqoop can start off where it ended. Sqoop, when running in incremental mode, always prints out the value of the last mported row. This allows you to easily pick up where you left off.

Page 27: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Importing Only New Data

• sqoop import \• --connect jdbc:mysql://master:3306/sqoop \• --username root \• --password root \• --table cities \• --target-dir /alex/input/append \• --incremental append \• --check-column id \• --last-value 1

Page 28: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Incrementally Importing Mutable Data

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table visits \• --incremental lastmodified \• --check-column last_update_date \• --last-value "2013-05-22 01:01:01"

Page 29: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Preserving the Last Imported Value

• sqoop job \• --create visits \• -- \• import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table visits \• --incremental append \• --check-column id \• --last-value 0• sqoop job --exec visits

Page 30: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Sqoop Job

• The Sqoop metastore is a powerful part of Sqoop that allows you to retain your job definitions and to easily run them anytime. Each saved job has a logical name that is used for referencing. You can list all retained jobs using the --list parameter:

• sqoop job --list• You can remove the old job definitions that are no longer needed with the --delete• parameter, for example:• sqoop job --delete visits• And finally, you can also view content of the saved job definitions using the --show• parameter, for example:• sqoop job --show visits• Output of the --show command will be in the form of properties. Unfortunately, Sqoop• currently can’t rebuild the command line that you used to create the saved job.

Page 31: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Storing Passwords in the Metastore

• <configuration>• ...• <property>• <name>sqoop.metastore.client.record.password</name>• <value>true</value>• </property>• </configuration>

Page 32: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Overriding the Arguments to a Saved Job

• sqoop job --exec visits -- --verbose

Page 33: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Sharing the Metastore Between Sqoop Clients

• sqoop job \• --create cities \• --meta-connect jdbc:hsqldb:hsql://master:16000/sqoop \• -- \• import \• --connect jdbc:mysql://master:3306/sqoop \• --username root \• --password root \• --table cities \• --target-dir /alex/input/append \• --incremental append \• --check-column id \• --last-value 1

Page 34: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

sqoop-site.xml

• <configuration>• ...• <property>• <name>sqoop.metastore.client.autoconnect.url</name>• <value>jdbc:hsqldb:hsql://your-metastore:16000/sqoop</value>• </property>• </configuration>

Page 35: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Free-Form Query Import

• The previous chapters covered the use cases where you had an input table on the source

• database system and you needed to transfer the table as a whole or one part at a time

• into the Hadoop ecosystem. This chapter, on the other hand, will focus on more advanced

• use cases where you need to import data from more than one table or where you

• need to customize the transferred data by calling various database functions.

Page 36: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Importing Data from Two Tables

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --query 'SELECT normcities.id, \• countries.country, \• normcities.city \• FROM normcities \• JOIN countries USING(country_id) \• WHERE $CONDITIONS' \• --split-by id \• --target-dir cities

Page 37: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Using Custom Boundary Queries

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --query 'SELECT normcities.id, \• countries.country, \• normcities.city \• FROM normcities \• JOIN countries USING(country_id) \• WHERE $CONDITIONS' \• --split-by id \• --target-dir cities \• --boundary-query "select min(id), max(id) from normcities"

Page 38: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Renaming Sqoop Job Instances

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --query 'SELECT normcities.id, \• countries.country, \• normcities.city \• FROM normcities \• JOIN countries USING(country_id) \• WHERE $CONDITIONS' \• --split-by id \• --target-dir cities \• --mapreduce-job-name normcities

Page 39: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Importing Queries with Duplicated Columns

• --query "SELECT \• cities.city AS first_city \• normcities.city AS second_city \• FROM cities \• LEFT JOIN normcities USING(id)"

Page 40: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Export data to database

• The previous three chapters had one thing in common: they described various use cases of transferring data from a database server to the Hadoop ecosystem. What if you have the opposite scenario and need to transfer generated, processed, or backed-up data from Hadoop to your database? Sqoop also provides facilities for this use case, and the following recipes in this chapter will help you understand how to take advantage of this feature.

Page 41: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Transferring Data from Hadoop

• sqoop export \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --export-dir cities

Page 42: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Inserting Data in Batches

• sqoop export \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --export-dir cities \• --batch

Page 43: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Inserting Data in Batches

• sqoop export \• -Dsqoop.export.records.per.statement=10 \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --export-dir cities

• sqoop export \• -Dsqoop.export.statements.per.transaction=10 \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --export-dir cities

Page 44: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Exporting with All-or-Nothing Semantics

• sqoop export \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --staging-table staging_cities

Page 45: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Updating an Existing Data Set

• sqoop export \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --update-key id

Page 46: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Updating or Inserting at the Same Time

• sqoop export \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --update-key id \• --update-mode allowinsert

Page 47: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Using Stored Procedures

• sqoop export \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --call populate_cities

Page 48: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Exporting into a Subset of Columns

• sqoop export \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --columns country,city

Page 49: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Encoding the NULL Value Differently

• sqoop export \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --input-null-string '\\N' \• --input-null-non-string '\\N'

Page 50: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Use Sqoop import data to Hive

• Sqoop to import your data directly into Hive.

Page 51: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Importing Data Directly into Hive

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --hive-import

Page 52: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Using Partitioned Hive Tables

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --hive-import \• --hive-partition-key day \• --hive-partition-value "2013-05-22"

Page 53: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Replacing Special Delimiters During Hive Import

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --hive-import \• --hive-drop-import-delims

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --hive-import \• --hive-delims-replacement "SPECIAL"

Page 54: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Using the Correct NULL String in Hive

• sqoop import \• --connect jdbc:mysql://mysql.example.com/sqoop \• --username sqoop \• --password sqoop \• --table cities \• --hive-import \• --null-string '\\N' \• --null-non-string '\\N'

Page 55: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Sqoop summary

• Sqoop dependency on the JDBC• Sqoop will influence the source database performance.

Page 56: Sqoop In Actionstatic.roncoo.com/lecturer/da2ea00e057547a18320914bef7dc9e4.pdf · • Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes.

Recommended