Post on 16-Apr-2017
transcript
Benchmark ing H ive a t Yahoo Sca le
P R E S E N T E D B Y M i t h u n R a d h a k r i s h n a n J u n e 4 , 2 0 1 4⎪
2 0 1 4 H a d o o p S u m m i t , S a n J o s e , C a l i f o r n i a
2
About myself
HCatalog Committer, Hive contributor› Metastore, Notifications, HCatalog APIs› Integration with Oozie, Data Ingestion
Other odds and ends› DistCp
mithun@apache.org
2014 Hadoop Summit, San Jose, California
3
About this talk
Introduction to “Yahoo Scale” The use-case in Yahoo The Benchmark The Setup The Observations (and, possibly, lessons) Fisticuffs
2014 Hadoop Summit, San Jose, California
4
The Y!Grid
16 Hadoop Clusters in YGrid› 32500 Nodes› 750K jobs a day
Hadoop 0.23.10.x, 2.4.x Large Datasets
› Daily, hourly, minute-level frequencies› Terabytes of data, 1000s of files, per dataset instance
Pig 0.11 Hive 0.10 / HCatalog 0.5
› => Hive 0.12
2014 Hadoop Summit, San Jose, California
5
Data Processing Use cases
2014 Hadoop Summit, San Jose, California
Pig for Data Pipelines› Imperative paradigm› ~45% Hadoop Jobs on Production Clusters
• M/R + Oozie = 41%
Hive for Ad hoc queries› SQL› Relatively smaller number of jobs
• *Major* Uptick
Use HCatalog for Inter-op
6 Yahoo Confidential & Proprietary
Hive is Currently the Fastest Growing Product on the Grid
Mar-13 Apr-13 May-13 Jun-13 Jul-13 Aug-13 Sep-13 Oct-13 Nov-13 Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-140
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
All Jobs Hive (% of all jobs)
All G
rid J
obs
(in M
illio
ns)
Hive
Jobs
(% o
f All
Jobs
)
2.4 million Hive jobs
7
Business Intelligence Tools
{Tableau, MicroStrategy, Excel, … } Challenges:
› Security• ACLs, Authentication, Encryption over the wire, Full-disk Encryption
› Bandwidth• Transporting results over ODBC
› Query Latency• Query execution time• Cost of query “optimizations”• “Bad” queries
2014 Hadoop Summit, San Jose, California
8
The Benchmark
TPC-h› Industry standard (tpc.org/tpch)› 22 queries› dbgen –s 1000 –S 3
• Parallelizable
Reynold Xin’s excellent work:› https://github.com/rxin› Transliterated queries to suit Hive 0.9
2014 Hadoop Summit, San Jose, California
9
Relational Diagram
2014 Hadoop Summit, San Jose, California
PARTKEY
NAME
MFGR
BRAND
TYPE
SIZE
CONTAINER
COMMENT
RETAILPRICE
PARTKEY
SUPPKEY
AVAILQTY
SUPPLYCOST
COMMENT
SUPPKEY
NAME
ADDRESS
NATIONKEY
PHONE
ACCTBAL
COMMENT
ORDERKEY
PARTKEY
SUPPKEY
LINENUMBER
RETURNFLAG
LINESTATUS
SHIPDATE
COMMITDATE
RECEIPTDATE
SHIPINSTRUCT
SHIPMODE
COMMENT
CUSTKEY
ORDERSTATUS
TOTALPRICE
ORDERDATE
ORDER-PRIORITY
SHIP-PRIORITY
CLERK
COMMENT
CUSTKEY
NAME
ADDRESS
PHONE
ACCTBAL
MKTSEGMENT
COMMENT
PART (P_)SF*200,000
PARTSUPP (PS_)SF*800,000
LINEITEM (L_)SF*6,000,000
ORDERS (O_)SF*1,500,000
CUSTOMER (C_)SF*150,000
SUPPLIER (S_)SF*10,000
ORDERKEY
NATIONKEY
EXTENDEDPRICE
DISCOUNT
TAX
QUANTITY
NATIONKEY
NAME
REGIONKEY
NATION (N_)25
COMMENT
REGIONKEY
NAME
COMMENT
REGION (R_)5
10
The Setup
› 350 Node cluster• Xeon boxen: 2 Slots with E5530s => 16 CPUs• 24GB memory
– NUMA enabled
• 6 SATA drives, 2TB, 7200 RPM Seagates• RHEL 6.4• JRE 1.7 (-d64)• Hadoop 0.23.7+/2.3+, Security turned off• Tez 0.3.x• 128MB HDFS block-size
› Downscale tests: 100 Node cluster• hdfs-balancer.sh
2014 Hadoop Summit, San Jose, California
11
The Prep
Data generation:› Text data: dbgen on MapReduce› Transcode to RCFile and ORC: Hive on MR
• insert overwrite table orc_table partition( … ) select * from text_table;
› Partitioning:• Only for 1TB, 10TB cases• Perils of dynamic partitioning
› ORC File:• 64MB stripes, ZLIB Compression
2014 Hadoop Summit, San Jose, California
Observat ions
13 2014 Hadoop Summit, San Jose, California
14
100 GB
› 18x speedup over Hive 0.10 (Textfile)• 6-50x
› 11.8x speedup over Hive 0.10 (RCFile)• 5-30x
› Average query time: 28 seconds• Down from 530 (Hive 0.10 Text)
› 85% queries completed in under a minute
2014 Hadoop Summit, San Jose, California
15 2014 Hadoop Summit, San Jose, California
16
1 TB
› 6.2x speedup over Hive 0.10 (RCFile)• Between 2.5-17x
› Average query time: 172 seconds• Between 5-947 seconds• Down from 729 seconds (Hive 0.10 RCFile)
› 61% queries completed in under 2 minutes› 81% queries completed in under 4 minutes
2014 Hadoop Summit, San Jose, California
17 2014 Hadoop Summit, San Jose, California
18
10 TB
› 6.2x speedup over Hive 0.10 (RCFile)• Between 1.6-10x
› Average query time: 908 seconds (426 seconds excluding outliers)• Down from 2129 seconds with Hive 0.10 RCFile
– (1712 seconds excluding outliers)› 61% queries completed in under 5 minutes› 71% queries completed in under 10 minutes› Q6 still completes in 12 seconds!
2014 Hadoop Summit, San Jose, California
19
Explaining the speed-ups
Hadoop 2.x, et al. Tez
› (Arbitrary DAG)-based Execution Engine› “Playing the gaps” between M&R
• Temporary data and the HDFS› Feedback loop› Smart scheduling› Container re-use› Pipelined job start-up
Hive › Statistics› “Vector-ized” Execution
ORC› PPD
2014 Hadoop Summit, San Jose, California
20 2014 Hadoop Summit, San Jose, California
21 2014 Hadoop Summit, San Jose, California
ORC File Layout Data is composed of multiple streams per
column
Index allows for skipping rows (default to every 10,000 rows), keeping position in each stream, and min-max for each column
Footer contains directory of stream locations, and the encoding for each column
Integer columns are serialized using run-length encoding
String columns are serialized using dictionary for column values, and the same run length encoding
Stripe footer is used to find the requested column’s data streams and adjacent stream reads are merged
22 2014 Hadoop Summit, San Jose, California
ORC UsageCREATE TABLE addresses ( name string, street string, city string, state string, zip int ) STORED AS orc TBLPROPERTIES ("orc.compress"= "ZLIB");LOCATION ‘/path/to/addresses’;
ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT orc
SET hive.default.fileformat = orcSET hive.exec.orc.memory.pool = 0.50 (ORC writer is allowed 50% of JVM heap size by default)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde’INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
Key Default Comments
orc.compress ZLIB high-level compression (one of NONE, ZLIB, Snappy)
orc.compress.size 262,144 (256 KB) number of bytes in each compression chunk
orc.stripe.size 67,108,864 (64 MB) number of bytes in each stripe. Each ORC stripe is processed in one map task (try 32 MB to cut down on disk I/O)
orc.row.index.stride 10,000 number of rows between index entries (must be >= 1,000). A larger stride-size increases the probability of not being able to skip the stride, for a predicate.
orc.create.index true whether to create row indexes. This is for predicate push-down (bloom-filters). If data is frequently accessed/filtered on a certain column, then sorting on the column and using index-filters makes column filters work faster
23 2014 Hadoop Summit, San Jose, California
24 2014 Hadoop Summit, San Jose, California
25
Configuring ORC
set hive.merge.mapredfiles=true set hive.merge.mapfiles=true set orc.stripe.size=67,108,864
› Half the HDFS block-size• Prevent cross-block stripe-read• Tangent: DistCp
set orc.compress=???› Depends on size and distribution› Snappy compression hasn’t been explored
YMMV› Experiment
2014 Hadoop Summit, San Jose, California
26 2014 Hadoop Summit, San Jose, California
Conclusions
28
Y!Grid sticking with Hive
Familiarity› Existing ecosystem
Community Scale Multitenant Coming down the pike
› CBO› In-memory caching solutions atop HDFS
• RAMfs a la Tachyon?
2014 Hadoop Summit, San Jose, California
29
We’re not done yet
SQL compliance Scaling up the metastore
performance Better BI Tool integration Faster transport
› HiveServer2 result-sets
2014 Hadoop Summit, San Jose, California
30
References
The YDN blog post:› http
://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn
Code:› https://github.com/mythrocks/hivebench (TPC-h scripts, datagen, transcode utils)› https://github.com/t3rmin4t0r/tpch-gen (Parallel TPC-h gen)› https://github.com/rxin/TPC-H-Hive (TPC-h scripts for Hive)› https://issues.apache.org/jira/browse/HIVE-600 (Yuntao’s initial TPC-h JIRA)
2014 Hadoop Summit, San Jose, California
Thank You@mithunrk
mithun@apache.org
We are hiring!
Stop by Kiosk P9 or reach out to us at bigdata@yahoo-inc.com.
I ’m glad you asked.
33
Sharky comments
Testing with Shark 0.7.x and Shark 0.8› Compatible with Hive Metastore 0.9› 100GB datasets : Admirable performance› 1TB/10TB: Tests did not run completely
• Failures, especially in 10TB cases• Hangs while shuffling data• Scaled back to 100 nodes -> More tests ran through, but not completely
› nReducers: Not inferred
Miscellany› Security› Multi-tenancy› Compatibility
2014 Hadoop Summit, San Jose, California