Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | chen-gwen-shapira |
View: | 110 times |
Download: | 3 times |
With Oracle Database and Hadoop
Building the Integrated Data Warehouse
Gwen Shapira, Senior Consultant
© 2012 – Pythian
Why Pythian • Recognized Leader:
• Global industry leader in data infrastructure managed services and consulting with expertise in Oracle, Oracle Applications, Microsoft SQL Server, MySQL, big data and systems administration
• Work with over 200 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments
• Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. Employ 8
Oracle ACEs/ACE Directors • Hold 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata,
Oracle GoldenGate & Oracle RAC
• Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special
projects or emergency response
© 2012 – Pythian
About Gwen Shapira • Oracle ACE Director • 13 Years with pager
• 7 as Oracle DBA
• Senior Consultant:
• Has MacBook, will travel.
• @gwenshap
• http://www.pythian.com/news/author/shapira/
© 2012 – Pythian
Agenda • What is Big Data?
• Why do we care about Big Data?
• Why your DWH needs Hadoop?
• Examples of Hadoop in the DWH
• How to integrate Hadoop into your DWH
• Avoiding major pitfalls
What is Big Data?
© 2012 – Pythian
MORE DATA THAN YOU CAN HANDLE
© 2012 – Pythian
MORE DATA THAN RELATIONAL DATABASES CAN HANDLE
© 2012 – Pythian
MORE DATA THAN RELATIONAL DATABASES CAN HANDLE CHEAPLY
© 2012 – Pythian
Data Arriving at fast Rates Typically unstructured Stored without aggregation
Analyzed in Real Time For Reasonable Cost
© 2012 – Pythian
Where does Big Data come from?
• Social media • Enterprise transactional data • Consumer behaviour • Multimedia • Sensors and embedded devices • Network devices
© 2012 – Pythian
Why all the Excitement?
© 2012 – Pythian
Complex Data Architecture
Your DWH needs Hadoop
© 2012 – Pythian
Big Problems with Big Data • It is:
• Unstructured
• Unprocessed
• Un-aggregated
• Un-filtered
• Repetitive
• And generally messy.
Oh, and there is a lot of it.
© 2012 – Pythian
Technical Challenges • Storage capacity
• Storage throughput
• Pipeline throughput
• Processing power
• Parallel processing
• System Integration
• Data Analysis
Scalable storage
Massive Parallel Processing
Ready to use tools
© 2012 – Pythian
Hadoop Principles Bring Code to Data Share Nothing
© 2012 – Pythian
Hadoop in a Nutshell
Map-Reduce: Framework for writing massively parallel jobs
HDFS: ���Replicated Distributed Big-Data File System
© 2012 – Pythian
Hadoop Benefits • Reliable solution based on unreliable hardware
• Designed for large files
• Load data first, structure later
• Designed to maximize throughput of large scans
• Designed to maximize parallelism
• Designed to scale
• Flexible development platform
• Solution Ecosystem
© 2012 – Pythian
Hadoop Limitations • Hadoop is scalable but not fast • Batteries not included • Instrumentation not included either • Well-known reliability limitations
Use Cases and Customer Stories
Hadoop in the Data Warehouse
© 2012 – Pythian
ETL for Unstructured Data
Logs Web servers, app server, clickstreams
Flume Hadoop Cleanup,
aggregation Longterm storage
DWH BI,
batch reports
© 2012 – Pythian
ETL for Structured Data
OLTP Oracle, MySQL,
Informix…
Sqoop, Perl
Hadoop Transformation
aggregation Longterm storage
DWH BI,
batch reports
© 2012 – Pythian
Bring the World into Your Datacenter
© 2012 – Pythian
Rare Historical Report
© 2012 – Pythian
Find Needle in Haystack
© 2012 – Pythian
We are not doing SQL anymore
Connecting the (big) Dots
© 2012 – Pythian
Sqoop Queries
© 2012 – Pythian
Sqoop is Flexible (for import)
• Select <columns> from <table> where <condition> • Or <write your own query>
• Split column
• Parallel
• Incremental
• File formats
© 2012 – Pythian
Sqoop Import Examples
• Sqoop import -‐-‐connect jdbc:oracle:thin:@//dbserver:1521/masterdb -‐-‐username hr -‐-‐table emp -‐-‐where “start_date > ’01-‐01-‐2012’”
• Sqoop import jdbc:oracle:thin:@//dbserver:1521/masterdb -‐-‐username myuser -‐-‐table shops -‐-‐split-‐by shop_id -‐-‐num-‐mappers 16
Must be indexed or partitioned to avoid 16 full table scans
© 2012 – Pythian
Less Flexible Export
• 100 row batch inserts • Commit every 100 batches
• Parallel export
• Update mode Example:
sqoop export -‐-‐connect jdbc:oracle:thin:@//dbserver:1521/masterdb -‐-‐table bar -‐-‐export-‐dir /results/bar_data
© 2012 – Pythian
Fuse-DFS • Mount HDFS on Oracle server:
• sudo yum install hadoop-0.20-fuse
• hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>
• Use external tables to load data into Oracle • File Formats may vary • All ETL best practices apply
© 2012 – Pythian
Oracle Loader for Hadoop • Load data from Hadoop into Oracle • Map-Reduce job inside Hadoop • Converts data types. • Partitions and sorts • Direct path loads • Reduces CPU utilization on database
© 2012 – Pythian
Oracle Direct Connector to HDFS • Create external tables of files in HDFS • PREPROCESSOR HDFS_BIN_PATH:hdfs_stream • All the features of External Tables • Tested (by Oracle) as 5 times faster (GB/s) than FUSE-DFS
© 2012 – Pythian
Big Data Appliance and Exadata
How not to Fail
© 2012 – Pythian
Data that belongs in RDBMS
© 2012 – Pythian
Prepare for Migration
© 2012 – Pythian
Use Hadoop Efficiently • Understand your bottlenecks:
• CPU, storage or network?
• Reduce use of temporary data:
• All data is over the network
• Written to disk in triplicate.
• Eliminate unbalanced workloads
• Offload work to RDBMS
• Fine-tune optimization with Map-Reduce
© 2012 – Pythian
Your Data is NOT as BIG
as you think
© 2012 – Pythian
Getting Started
• Pick a Business Problem • Acquire Data
• Use right tool for the job • Hadoop can start on the cheap
• Integrate the systems
• Analyze data • Get operational
© 2012 – Pythian
Thank you and Q&A
http://www.pythian.com/news/
http://www.facebook.com/pages/The-Pythian-Group/163902527671
@pythian
http://www.linkedin.com/company/pythian
1-877-PYTHIAN
To contact us…
To follow us…
@pythianjobs