+ All Categories
Home > Documents > Keeping up with the architects. - IEEE Computer...

Keeping up with the architects. - IEEE Computer...

Date post: 29-Aug-2018
Category:
Upload: nguyenduong
View: 224 times
Download: 0 times
Share this document with a friend
35
Keeping up with the architects. Andrew Warfield, UBC and Coho Data
Transcript

Keepingupwiththearchitects.

AndrewWarfield,UBCandCohoData

Aboutthiskeynote.(AndthethingsI'mnotgoingtotalkabout.)

Notgoingtotalkaboutanyofthisstuffrightnow(buthappytointhehallwaytrack)• FinishedPhDatCambridgein2006• Workedinindustrialresearch(AT&TandIntel)• Twostartups(XenSource andCohoData)• AssociateprofatUBC• Threekids• Iwentheli skiinglastFriday.

Here'swhatIamgoingtodo

• Makesomeprettyobviousobservationsabouttechnologydirections.• Drawsomedodgyandhighlyspeculativeconclusionsfromthoseobservations.• Trytoinfluenceyourresearch.

• Disclaimer:thisisnotaconferencetalk,norisit5stapledtogetherconferencetalks.• Anotherdisclaimer:I'mgoingtogiveyoumoreproblemsthansolutions.

Solet'sgo…

Section5:Evaluation.

• (Attheendoftheday,allsystemspapersareaboutperformance.)• Probablybecauseit'soneoftheonlythingsweknowhowtomeasure.• Therearetwotypesofperformanceresults:

1. Smallimprovementsinaverylargesystem.2. Speedupsthataresosignificantthattheychangefunctionality.

• GoogleandFacebookandAmazonandMicrosoftareprobablyalotbetteratsolvingmeaningfulproblemswiththeirsystemsthanyouare.

Herearethehigh-leveltrends/ideasbehindthistalk1. Diminishingscarcity.2. Practical/sensibletoownyourownhardwareagain.3. Thesoftwarewehaveisturningouttobeabigger,slower,more

onerousburdenthanthehardwareitrunson.• Itisapoormatchforchangingperformanceandfailurecharacteristicsofhardware.

• Itisapoormatchfortheoperationalneedsofusers.

Consequencesoftheseideas

• Thegoalpostsaremovingintermsofwhatwedesignsystemsfor.

• Humancostsassociatedwithrunningoursystemsareabiggerexpenseandinconvenience,atalllevels,thanthepiecewiseperformanceofcomponents.• Theyareactuallyabarrier.

• Theendofscarcitymarksthebeginningofapushforefficientpredictability.• Thisiswhystoragecustomersbyflash.It’salsoahardsystemsproblem.

Sowhatdoweneedtounderstand,assystemsresearchers,tohelp?

Onesignificanthardwarechage:Rackscale

11

Thisisagoogledatacentercirca2001.GFS(2003):largestdeploymentshadover1,000storagenodes,hundredsofclients,

300TBofstoragespace

http://itq.nl/intels-take-open-compute-project-rack-scale-architecture/

https://www.supermicro.com/solutions/SRSD.cfm

Whatis"rackscale"?

• Everythinginarackwillshareahighperformancebus.• Withinarack,opticalinterconnectsareexpectedtoreachterabitbandwidthintheneartermwithsub-microsecondlatencies.

• Theserverasweknowitwillbecompletelydisaggregated.• CPUs,GPUs,storage,networkinterfaces,andvolatilememorywilleachmovetoindependentphysicalenclosures.Arbitrarycompositionandindependentscale.

• Rackresourceswillbeverydense.• Like,really dense.• Asaballpark,withinarackwearelikelytoseethousandsofcores,tensofpetabytesofpersistentmemory,andterabytesofRAM.

• Inshort,asingledatacenterrackwithacapitalvalueinthelowmillionsofdollars,willbeascapableasentirefirst-generation(e.g.2003-era)"warehouse"datacentersfrompubliccloudproviders

Consequencesoftherackscaletrendonsoftwaredesign.

What’schanging?

1. Storageisbecomingdense.• Problematicallydense!

2. Thememoryhierarchyishavinganidentitycrisis.3. Applicationlatencyisacrueltaskmaster.

Trend1:Densenonvolatilestoragecapacity.

DenseNonvolatileCapacity

• Flashvendorshavefinallystartedtorelaxaboutthedurabilityproblem.• Thejawdroppingbit:wewillsee4PBin1uinasmallnumberofyears.• Atapricethatapproachesspinningdisk.

• Thebadnews:intheimmediateterm,interconnectionwillbeaproblem.• Andinthelongertermitmaynotgetawholelotbetter.

TrendsSSD Cap/1u Xputperdata

2TB 64TB 312MB/s/TB

8TB 256TB 78MB/s/TB

32TB 1PB 20MB/s/TB

128TB 4PB 5MB/s/TB

18

TrendsSSD Cap/1u Xputperdata

2TB 64TB 312MB/s/TB

8TB 256TB 78MB/s/TB

32TB 1PB 20MB/s/TB

128TB 4PB 5MB/s/TB

NVMedevice:x4PCIeBroadwellCPU:40PCIelanes

19

TrendsSSD Cap/1u Xputperdata

2TB 64TB 312MB/s/TB

8TB 256TB 78MB/s/TB

32TB 1PB 20MB/s/TB

128TB 4PB 5MB/s/TB

NVMedevice:x4PCIeBroadwellCPU:40PCIelanes

TORcross-racklinkstypicallyoversubscribedat3or4:1

20

Thisisverydifferentfromallthestoragesystemsthatwe'vebuiltinthepast.• Noseekpenalty.

• MeansthatbackgroundI/Oisactuallyreasonabletodo.• Migrationforperformance.• Alternaterepresentations(e.g.materializedviews,intentionalDUPlication)oftenforperformance.

• Metadataalldaylong.• Sprinklerheadsareaproblem.

• 4PBisanawfullyscaryfailuredomain.• Sensibleapplicationoferasurecodingneedsfiveormorenodes.• East/westtrafficisconstrained.

• Capacity-motivateddeletionissillyinmostcases.• Butrealdeletionprobablyneedstobeencryptionbased.

Mirador (FAST’17)

Centralizedthree-stagepipelinecontinuouslyoptimizesplacement22

Trend2:Themagicofpersistentmemory.

PersistentMemory

• Everyoneisexcitedabout3DXpoint.• (Whattheheckis3dxpoint?)

• Badnews:persistentRAMisatotalPITA.• Becauseit'snotreallypersistentRAM:ramasyouthinkaboutitisatotalilusion.• It'sreallyasuperduperfastdisk.• Infact,it'sasuperduperfast*single**unreliable*disk.Butmoreonthisinasec.

• Butwait,thisdoesn'tmeanthatXPoint isn'taspectacularlygoodidea.• Withit,RAMisabouttobreakthroughthememorywall(coretocapacityratio).• TechnologieslikeXPoint willgiveusamultiplieronworkingset.• Persistencewillmassivelyspeeduprestarttimes,especiallyforread-onlydata.

Onemorespanner:Disaggregation.

• Somesignificantamountofmemoryisabouttomoveoffhost.• Nobodyseemstoagreeonhowthisisgoingtohappen.

• "remote"memoryvssharedphysicalbusvsRack-scaleCCNUMA• Allofthesethingsareinterestingintwoways.

• First,failuredomainsareverydifferent...inwaysthatAppsandOSesareNOTusedtoreasoningabout.

• Second,theyaffordanentirelynew(andexciting!)formofdynamism.• MapReduceandSparkhaveagoodbutverycoarse-grainednotionofpartitioning.• Thesesystemshavethepotentialtobesomuchmoredynamic.• Sameforscaleoutdatastores.• SameforstatereplicationandHA

Sowhat'sgoingtohappenhere

• Totalchaos.• Persistentmemorylookslikeareallyfastdisk.Disaggregatedmemorylookslikeanextensionofthecachehierarchy.• Ourviewofmemory,locality,andpersistenceisintrouble.• Interfacesandabstractionsreallyneedtochangeinsupportofthis.• Oneprediction:filesystemandvirtualmemorywillmerge.

• Loadsofreasonstodothis-- serializationoverheads,reboots,sharing.• butstillmanyopenquestions.

Trend2:Applicationlatency.

Latency

• Tellmeifyou’veheardthisonebefore:CPUsaren'tgettingfaster• I/Oisgettingfasterandwider.• Latencyisbecomingadominantmetric.

• Directimpactone.g.purchaseprobability.• Butit'samuchhardermetrictoworkwiththanthroughput.

• ShrinkingI/Olatenciesresultsinincreasedcomputationaldensity.• BecauseI/Owaitgoesaway(e.g.DBMS)

• Butalatencyfocusimposesalotofconstraintsonsoftwaredesign.• Especiallytail-latencySLOs.• Needtoreasonabouttheslowpathasacommoncase.

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

ContentionFree SingleLock

NumberofCores

Throughp

ut(K

IOPS)

THE COST OF CONTENTION

Core

DPDK

TCP

SPDK

BlockI/O

DecibelLogic

Userspace

Kernel

CoreCore

HardwareQueues

Decibel(NSDI‘17)

• Howshouldwestructureastoragesystemtoprovidevirtuallocaldisks?

• Partitionlikecrazy,crusadeagainstlatency,pushallunnecessaryfunctionalityupthestack.

• Thisgeneralizestoapplications.

0

100

200

300

400

500

600

700

800

900

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Local Decibel(DPDK) Decibel(Legacy)

0

100

200

300

400

500

600

700

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Local Decibel(DPDK) Decibel(Legacy)

DecibelPerformance(70/30MixedWorkload)

422 vs450 vs490μs

Throughput(KIOPS) Latency(μs)

NumberofCores NumberofCores

Everythinghurtslatency

• Redundancyisagoodexampleofwhythisgetshard.• Forinmemory,networkRTTwillapproachmediastoretime.• Soaremotewritedoublesthecost.• Worse:Replicationatlowerlayersofthesystemisinvariablyamplified.• Thisiswhyemergingdatastoresdon'tdoit.

• Areallatencyfocusdrivessoftwarearchitectureinaveryspecificdirection.• Contentionisasourceofhard-to-reason-aboutperformancevariance.• Soavoidcontentionatallcosts.Designitoutupfront.• (Ifyoudothisright,youbenefitfromnothavingtohiredevelopersthatunderstandlocking.)

• Doingthisrightmeansdesigningdataandcode-levelpartitioningverycarefully.• LessacademicallyrewardingthanOCCandlockfreedom,butseeparentheticpointabove.

Andwiththat,I’mmostlydone.

Herearethehigh-leveltrends/ideas

1. Diminishingscarcity.2. Practical/sensibletoownyourownhardwareagain.3. Softwareneedstochange.

Closingthought.

• Nobodyisgoingtoadoptyourstuffunlessyoumakeitaseasyasheckforthemtodoit.• Exposeyourresearchresultsasaservice,orassomethingasclosetoaserviceasispossible.• Putthemincontainers,hostthemonAWS.

• Solveapplicationproblems.• Earlyexperiencesworkingwithphysicalscientists.


Recommended