Home >Technology >Invisible loading
Invisible loading
Date post:01-Dec-2014
Category:Technology
View:2,057 times
Download:2 times
Share this document with a friend
Description:
Invisible Loading Talk by Azza Abouzied at the VLDB Workshop on End-to-end Management of Big Data 2012
Transcript:
<ul><li> 1. Invisible Loading Yalies: Azza Abouzied, Daniel Abadi, Avi Silberschatz BigData 2012 </li> <li> 2. Problem: The Crying Baby </li> <li> 3. Two ways to deal with this: Immediate GraDcaDon Long term $$$ costs Misery &amp; sleep deprivaDon Long term benets </li> <li> 4. The Crying Baby Problem Wants A(en*on Now! The ImpaDent Boss Problem Wants Answers Now! </li> <li> 5. Two ways to analyze data MapReduce way Immediate GraDcaDon Hack it: Locate Determine Parse Long-term cumulaDve costs Key File +Map because MR is slow! Attributes +ReduceDB &amp; HadoopDB way Organize Query: Figure Determine Process Locate or Index out Load File Key File DB without schema Attributes tables Parse Misery &amp; sleep deprivaDon Long term benets </li> <li> 6. The Problem Can we get the immediate gra*ca*on of working with MapReduce and make progress towards the performances advantages of working with Databases? </li> <li> 7. Our SoluDon Begin with the MapReduce Way File System Write Determine Map/ Locate Run it! Key Reduce File Attributes Scripts Database System BEHIND-THE-SCENCES PER JOB Organize Figure or Index out Load File DB schema tables INCREMENTALLY </li> <li> 8. Figureout P1) How to automaDcally gure schema out a schema? Short answer: DONT Split map phase into Parse and Map phases. Enforce a simple Parse API: Parser has one output method: getField(int id) Name a table aZer its Parser-implementaDon and label a[ributes with their eld id. Dierent parsers on the same le result in dierent tables. </li> <li> 9. Incrementally P2) How to load les with minimal marginal costs? Load File Load only touched a[ributes (VerDcal ParDDon) Requires a Column-Store Load only parts of a column (Horizontal parDDon) AZer a le-split is processed by Map, its touched a[ributes are loaded enDrely How many splits of a le is a tunable parameter. </li> <li> 10. Tuple construcDon Some columns are at dierent loading stages. Maintain OIDs for each column: an address column The OIDs assigned are equivalent to the inserDon order Keep a catalog to track loading progress a b c d Process in DB Use File System </li> <li> 11. Incrementally P3) How to index a parDally- loaded table? Organize fileIf a selec*on lter is applied on an a(ribute, we organize it. Dealing with parDally loaded a[ributes c1 c2 address $ ! # &amp; c1 % " $ column # ## % ( !!"#$$$ ! !! ) % %"#$$$ ) % * &amp; &amp;"#$$$ * &amp; ! !! JOIN !"#$$$ ( ! ( ! "#$$$ &amp; , ( &amp;"#$$$ + &amp; + &amp; ("#$$$ , ( &amp; ! !%"#$$$ !% !% " ( ## &amp; </li> <li> 12. Choosing an organizaDon strategy Why not use merge sort? ./01#2# 3/45 !!"#$$$ % % %"#$$$ &amp; ! % &amp;"#$$$ !! ( ! &amp; + !"#$$$ ! &amp; , "#$$$ ( ( &amp;"#$$$ &amp; !% &amp; ("#$$$ !! &amp; !%"#$$$ !% ) - )"#$$$ ) + *"#$$$ * , * !+"#$$$ !+ ) !% - !! ,"#$$$ + * !+ !,"#$$$ , !+ !, +"#$$$ - !, !&amp; -"#$$$ !, !&amp; !&amp;"#$$$ !&amp; 367859#8:#;9#/B#5C&gt;#A/7D@:#8:# =87&gt;#3?95&gt;@ 1050E09&gt; </li> <li> 13. Incremental Merge Sort 0123#4# 892:;#! 892:;#+ 5167 57/?&gt;7.#!%%%? 57/?&gt;7.#%!%%? !!"#$$$ % % ! % %"#$$$ &amp; ! &amp;"#$$$ !! ( %.#%/, &amp; + &amp; , !"#$$$ ! "#$$$ ( &amp;"#$$$ * &amp; !.#/!! ("#$$$ !% !% !%"#$$$ !! !! !% + ( )"#$$$ ) , &amp; *"#$$$ * +.#(/- ) &amp; !+"#$$$ !+ - ) - ,"#$$$ + * !,"#$$$ , !+ !+ ,.#!+/!&amp; +"#$$$ - !, !, -"#$$$ !, !&amp; !&amp; !&amp;"#$$$ !&amp; 57:#&gt;@#ABC5# 5=&gt;F;:#1G#79;#F1=HE@#&gt;@# 5&gt;E@3;I C&gt;=;#5D:7;E 3272?2:; </li> <li> 14. EVALUATION </li> <li> 15. Setup Single-Machine Experiments Embarrassingly parallel No distributed reorganizaDon or parDDoning MonetDB (hacked to support IMS) Hadoop 2 GB le of 5 integer a[ributes: 107,374,182 tuples. See paper for more details </li> <li> 16. The big picture 800 SQL Pre-load Incremental Reorganization (5/5) Incremental Reorganization (2/5) 700 Invisible Loading (5/5) Invisible Loading (2/5) MapReduce 600 500Time in Seconds 400 300 200 100 0 1 10 100 Job Sequence </li> <li> 17. CumulaDve costs 100000 SQL Pre-load Incremental Reorganization (5/5) Incremental Reorganization (2/5) Invisible Loading (5/5) Invisible Loading (2/5) MapReduceCumulative Time Spent in Seconds 10000 1000 100 1 10 100 Job Sequence </li> <li> 18. Change the access pa[ern 800 SQL Pre-load Incremental Reorganization (5/5) Incremental Reorganization (2/5) 700 Invisible Loading (5/5) Invisible Loading (2/5) MapReduce 600 500Time in Seconds 400 300 200 100 0 1 10 83 85 87 89 91 93 Job Sequence (Log scale) Job Sequence (Linear scale) </li> <li> 19. Further EvaluaDon (Paper) In-depth study of IMS Comparison with Cracking and Pre-sorDng Eect of integraDng Lightweight compressions into IMS. Li[le mini-experiments InserDon vs. Copy Processing in DB vs. using DB as a fast access medium with all processing in MapReduce </li> <li> 20. Conclusion: Lessons Learned Engineering Nightmare Many complemenDng technologies Manimal, AdapDve Merging In the era of Big-Data we need to design more modular, plug-n-play tools Can of worms Most BigData problems look decepDvely...</li></ul>
Embed Size (px)
Recommended