Home >Technology >Invisible loading

Invisible loading

Date post:01-Dec-2014
Category:
View:2,059 times
Download:2 times
Share this document with a friend
Description:
Invisible Loading Talk by Azza Abouzied at the VLDB Workshop on End-to-end Management of Big Data 2012
Transcript:
  • 1. Invisible Loading Yalies: Azza Abouzied, Daniel Abadi, Avi Silberschatz BigData 2012
  • 2. Problem: The Crying Baby
  • 3. Two ways to deal with this: Immediate GraDcaDon Long term $$$ costs Misery & sleep deprivaDon Long term benets
  • 4. The Crying Baby Problem Wants A(en*on Now! The ImpaDent Boss Problem Wants Answers Now!
  • 5. Two ways to analyze data MapReduce way Immediate GraDcaDon Hack it: Locate Determine Parse Long-term cumulaDve costs Key File +Map because MR is slow! Attributes +ReduceDB & HadoopDB way Organize Query: Figure Determine Process Locate or Index out Load File Key File DB without schema Attributes tables Parse Misery & sleep deprivaDon Long term benets
  • 6. The Problem Can we get the immediate gra*ca*on of working with MapReduce and make progress towards the performances advantages of working with Databases?
  • 7. Our SoluDon Begin with the MapReduce Way File System Write Determine Map/ Locate Run it! Key Reduce File Attributes Scripts Database System BEHIND-THE-SCENCES PER JOB Organize Figure or Index out Load File DB schema tables INCREMENTALLY
  • 8. Figureout P1) How to automaDcally gure schema out a schema? Short answer: DONT Split map phase into Parse and Map phases. Enforce a simple Parse API: Parser has one output method: getField(int id) Name a table aZer its Parser-implementaDon and label a[ributes with their eld id. Dierent parsers on the same le result in dierent tables.
  • 9. Incrementally P2) How to load les with minimal marginal costs? Load File Load only touched a[ributes (VerDcal ParDDon) Requires a Column-Store Load only parts of a column (Horizontal parDDon) AZer a le-split is processed by Map, its touched a[ributes are loaded enDrely How many splits of a le is a tunable parameter.
  • 10. Tuple construcDon Some columns are at dierent loading stages. Maintain OIDs for each column: an address column The OIDs assigned are equivalent to the inserDon order Keep a catalog to track loading progress a b c d Process in DB Use File System
  • 11. Incrementally P3) How to index a parDally- loaded table? Organize fileIf a selec*on lter is applied on an a(ribute, we organize it. Dealing with parDally loaded a[ributes c1 c2 address $ ! # & c1 % " $ column # ## % ( !!"#$$$ ! !! ) % %"#$$$ ) % * & &"#$$$ * & ! !! JOIN !"#$$$ ( ! ( ! "#$$$ & , ( &"#$$$ + & + & ("#$$$ , ( & ! !%"#$$$ !% !% " ( ## &
  • 12. Choosing an organizaDon strategy Why not use merge sort? ./01#2# 3/45 !!"#$$$ % % %"#$$$ & ! % &"#$$$ !! ( ! & + !"#$$$ ! & , "#$$$ ( ( &"#$$$ & !% & ("#$$$ !! & !%"#$$$ !% ) - )"#$$$ ) + *"#$$$ * , * !+"#$$$ !+ ) !% - !! ,"#$$$ + * !+ !,"#$$$ , !+ !, +"#$$$ - !, !& -"#$$$ !, !& !&"#$$$ !& 367859#8:#;9#/B#5C>#A/7D@:#8:# =87>#3?95>@ 1050E09>
  • 13. Incremental Merge Sort 0123#4# 892:;#! 892:;#+ 5167 57/?>7.#!%%%? 57/?>7.#%!%%? !!"#$$$ % % ! % %"#$$$ & ! &"#$$$ !! ( %.#%/, & + & , !"#$$$ ! "#$$$ ( &"#$$$ * & !.#/!! ("#$$$ !% !% !%"#$$$ !! !! !% + ( )"#$$$ ) , & *"#$$$ * +.#(/-
Embed Size (px)
Recommended