Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
Exadata:DeliveringMemoryPerformancewithSharedFlashSe#ngNewStandardsforDatabasePerformance
KothandaUmamageswaranVicePresident,ExadataDevelopmentGurmeetGoindiTechnicalProductStrategist,Exadata
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirecNon.ItisintendedforinformaNonpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfuncNonality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andNmingofanyfeaturesorfuncNonalitydescribedforOracle’sproductsremainsatthesolediscreNonofOracle.
OracleConfidenNal–Internal/Restricted/HighlyRestricted 2
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
• Availablenow– ExadataExpressCloudService
• Comingsoon– DatabaseCloudServices– ExadataCloudMachine
3
AnnouncingOracleDatabase12cRelease2onOracleCloud
OracleispresenNngfeaturesforOracleDatabase12cRelease2onOracleCloud.WewillannounceavailabilityoftheOn-PremreleasesomeNmeaXerOpenWorld.
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
DidYouMisstheStorageRevoluNon?
• IncumbentstoragevendorshavedecadesoldinvestmentinlegacyprotocolskeepingthemfromadopNngnewtechnologies
• PCIeFlashwithNVMeinterfaceisanewinterfacethatrealizesfullflashpotenNal
• PCIe/NVMestoragearchitecturesareordersofmagnitudefasterthanwhatyouprobablyusetoday
• AvailablenowwithOracleExadatastorage
GoodChanceYourStorageVendorDidToo
2014ConvenIonalStorageEra
ModernFlashEra
StoragePe
rforman
ce
SCSISAS
StorageVendors
PCIeNVMe
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SolidStateMediaisVeryDifferentThanSpinningDisk• ComparedtoSpinningDisk,Flash– Ismanyordersofmagnitudefaster– Hasmanyordersofmagnitudehigherbandwidth– Hasextremelylowlatency– Haswearingissuesasitages,buttechnologyiscatchingup– Isexpensive,butthepricegapisshrinking
• EverystoragevendorhassomeflashbasedsoluNonforyourDatabase
5
Q:Willmydatabaserealizethefullbenefitofflashtechnology?A:Itwilldependonhowfastyoucanmovethedatafromtheflashtothedatabase
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SCSIAccessModel• SCSIwasdesignedfortapesandHDDs• HDDsaresequenNalwhereasFlashdevicesaremassivelyparallel• TradiNonalIOstackisopNmizedforspinningmedia– 512Byteblocksizetransfers– Flashanddatabasesdo4KB/8KBIOs
• UsinglegacyinterfaceslikeSCSIfundamentallyboglenecksflashdrives
6
CPU HBA SCSI
8KBIO
512B
4KBIO512B
512B
SCSI
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
PCIExpressVsSASConnecNvity• PCIExpressisordersofmagnitudefasterthanSAS,andisgehngfaster• PCIExpresshasthesamecharacterisNcsasFlash– HighThroughput– LowLatency
• UsinglegacyinterconnectslikeSASfundamentallyboglenecksflashdrives
7
0.61.2
4
8
SAS6Gbps SAS12Gbps PCIe3.0x4 PCIe3.0x8
ThrougputGB/s
PCIehas13xthroughputofSAS
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
PCIExpressFlashwithNVMeInterface• NonVolaNleMemoryExpressisabrandnewgroundsupinterfacedesignedforflash• NVMeisinherentlyparallel• NVMeprovidesnaNveatomicIOsizeaffinityfordatabases• NVMeIOstackmassivelyreducesCPUuNlizaNonandlatency
8
PCIExpressFlashwithNVMeInterfaceistherightchoiceforyourDatabase
CPUPCIeNVMe
2.2xNVMeis2.2x
FasterthanSCSI
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataisLeadingNVMeAdopNon
9
ThousandsofExadatasystemsshippedwithNVMeFlashsince2014
2014 Q12016
FacebooklaunchesLightningbasedon
NVMe1stNVMeDrivebySamsung
2015
1stNVMeDrivebyIntel
ExadataX5-2Industry’sFristEnterprise
SystemwithNVMe
ExadataCloudServiceusesNVMeinPublic
Cloud
EMCAnnouncesDSSDD5withNVMe
ExadataX6-2SecondGeneraNon
withNVMe
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SharedStorageHasManyAdvantagesoverLocalStorage
• MuchbegerspaceuIlizaIon• Muchbegersecurity,management,reliability• EnablesDBconsolidaIon,DBhighavailability,RACscale-out• Sharesstorageperformance– Aggregateperformanceofsharedstoragecanbedynamicallyusedbyanyserverthatneedsit
10
Servers
SharedStorage
SAN/LAN
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
NewExadataX6Super-CapacityandPerformanceFlash• 3DV-NAND3.2TB/card(2Xpreviouscardcapacity)– 48layerNAND– Notradeoffs-fasterwrites,lowerpower,higherendurance• Latest,mostmoderninterface–NVMe(introducedinX5)• Fastestflashcardonmarketbywidemargin– OnlyflashcardonmarketwithPCI8-lanescalebandwidth~5.4GB/sec– HighestIOspersecond– Lowestoutliers–99.995%writeIOscompletewithin250us
11
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
NVMePCI-eFlashDisruptstheStorageArrayModel
|OracleConfidenNal–HighlyRestricted 12
LatestPCIeFlash5.4GB/sec
SANLink=40Gb5GB/sec
Lessthan1Flashcard
LeadingAllFlashArray24GB/sec
Lessthan5Flashcard
Newimprovementsarecausing100Xbo_lenecksacrosssharedstoragestack
ArrayHeads
CPU
All-FlashStorageArrayIOPath:manysteps,eachaddslatencyandcreatesbo_lenecks
SAS/SATA PCIeFlashChips
Switches
SAN/LAN
SSDCtrl
HostHBA
SAN/LAN
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
0
100
200
300
400
500
ExadataSingleRack
PureStorageLargest
EMCXtremeIO4-brick
ActualThroughput PotenIalThroughput*
OnlyExadataAchievesFullPerformanceofSharedFlash
13
• LeadingAll-FlashStorageArraysachieveunder3%ofpotenIalflashthroughput• PureStorage–132MB/secperflashdrive• EMCXtremIO–120MB/secperflashdrive• Spinningdisklevelthroughput!• ANDcan’tscale-outforhigherperformance• ANDcan’tshareeventhisslowperformanceduetobogleneckatserverinputs
• ExadataX6achievesfullflashthroughput• 5400MB/secperdrive
• ExadataalsoachievesmuchfasterOLTPIOs• 5.6MillionIOPs,250uslatencyevenat2.4MIOs
WastedFlashPotenIal
WastedFlashPotenIal
ActualThroughput
*PotenNalThroughputbasedonnumberofflashdevices
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataAchievesMemoryPerformancewithSharedFlash
• ExadataX6delivers300GB/secflashbandwidthtoanyserver– Approaches800GB/secaggregateDRAMbandwidthofDBservers
• MustmovecomputetodatatoachievefullflashpotenIal– Requiresowningfullstack,can’tbesolvedinstoragealone
• Fundamentally,StorageArrayscanshareflashcapacitybutnotflashperformance– Evenwithnextgenscale-out,PCIenetworks,orNVMeoverfabric– E.g.newEMCDSSDhas3-6ImesslowerthroughputthanExadataX6
• Sharedstoragewithmemorylevelbandwidthisaparadigmchangeintheindustry– GetnearDRAMthroughput,withthecapacityofsharedflash
14
ExadataDBServers
ExadataSmartStorage
InfiniBand
CPUPCIeNVMeFlashChips
QueryOffload
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataX6I/OisMuchFasterthanAll-FlashEMC
15
24
301
0
50
100
150
200
250
300
350
8X-BrickEMCXtremIO
1RackHCExadata
GB/sec
12X
AnalyIcScans
2M
0
1
2
3
4
5
8X-BrickEMCXtremIO
1RackHCExadata
OLTPWriteIOPS
2.5X5.2M
EMCPerformancedoesnotscalehigher-Exadatascalesbyaddingracks
• OneHighCapacityExadatabeatsthefastestEMCXtremIOall-flasharrayineveryperformancemetric– 12Xmorethroughput– 2.5XmoreIOPS– 2Xfasterlatency
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataX6I/OisMuchFasterthanAll-FlashPureStorage
• OneHighCapacityExadatabeatsthefastestPureStorageall-flasharrayineveryperformancemetric– 33Xmorethroughput– 4XmoreIOPS– 4Xfasterlatency
16
9
301
0
50
100
150
200
250
300
350
PureStorage//M70
1RackHCExadata
AxisTitle
GB/sec
33X
AnalyIcScans
0
1
2
3
4
5
PureStorage//M70
1RackHCExadata
OLTPWriteIOPS
4X5.2M
1.2M
PureStoragedoesnotscalehigher-Exadatascalesbyaddingracks
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
GehngMemoryperformancewithSharedFlashusingSmartSoXware
17
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
Oracle’sInfrastructureInnovaNonsinFlash
• OracleExadataV2:Firsttobringflashstoragetothedatabasemarket
• OracleExadataX3:Doubledflashcapacity• OracleExadataX4:100GB/sthroughputscansinasinglerack• OracleExadataX5:LowestlatencyNVMeandincreasesscansto263GB/s
• OracleExadataX5:Hot-pluggableNVMeserverforthedatabase
• OracleLinux:FirstLinuxvendorwithproducNonNVMedrivers
• OracleExadataX6:Highestthroughputover350GB/sandlowestlatency
18
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
Oracle’sSoXwareInnovaNonsinFlash
• ExadataSmartFlashCache• ExadataSmartFlashLog• ExadataSmartFlashCacheScanAwareness• ExadataSmartFileIniNalizaNon• ExadataSmartColumnarFlashCache• ExadataSmartFlashCacheSpaceResourceManagement• Upcoming:ExadataSmartInMemoryFormatsinFlash• Upcoming:SmartwriteburstandtempIOinFlashCache
19
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFlashCache• UnderstandsdifferenttypesofI/Osfromdatabase– SkipscachingI/Ostobackups,datapumpI/O,archivelogs,tablespaceformahng– CachesControlFileReadsandWrites,fileheaders,dataandindexblocks– Enablesmorespaceforrelevantuserdata
• Immediatelyadaptstochangingworkloads
• Write-backflashcache– Cacheswritesfromthedatabasenotjustreads
• Doesn’tneedtomirrorinflashforreadintensiveworkloads– Flasharraysstorebothmirrorcopiesalwaysinflashincreasingyourcost
• SmartScanscanrunatthethroughputofflashdrives– FlasharraysneedlotsofserverswithlotsofprocessesandsNllcannotmatchSmartScan
throughputofsinglequery
• Providesperformanceofflashatcostofdisk
20
1.3PBDISK
180TBPCIFLASH
12TBDRAM
ColdData
Ho_estData
AcIveData
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFlashLog • OutliersinlogIOslowdownlotsofclients• Outliersfromanyonecopyofmirrorslowdownalltheforegrounds– DatabasewaitNmegoesupby#foregrounds*StallNme– Backlogdoesn’tclearimmediatelylikeanaccidentonthefreewayandincreases“logfilesync”waits
• PerformancecriNcalalgorithmslikespacemanagementandindexsplitsaresensiNvetologwritelatency
• LegacystorageIOcannotdifferenNateredologIOfromothers
• UPSprotectedcacheintradiNonalstorageseemstoworkiniNallyunNlthecacheisoverwhelmedbyotherwrites– Measurelogfilelatencywithfullbackuporadataloadrunning
21
LogBuffer
client
foreground
client
foreground
client
foreground
LogWriter
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFlashLog
• SmartFlashLogusesflashasaparallelwritecachetodiskcontrollercache
• Whicheverwritecompletesfirstwins(diskorflash)
• ReducesresponseNmeandoutliers– “logfileparallelwrite”histogramimproves– Greatlyimproves“logfilesync”
• Usesalmostnoflashcapacity(<0.1%)
• NetworkresourcemanagementprovidespriorityforredologI/Osacrossthenetwork
• OLTPworkloadstransparentlyacceleratedandprovidepredictableresponseNmes
22
SmartLogging-Off SmartLogging-On
NoOutliers
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFlashCacheScanAwareness• OnatradiNonalcache,ifyouscandatasetlargerthancachesize– Blocks0,1,2,3broughtintocache,cacheisfull– ScanningBlocks20,21,22,23replaces0,1,2,3incache
• Repeatthesamescan– Block0,1,2,3willreplaceblocks20,21,22,23– Block20,21,22,23willagainreplaceblock0,1,2,3
• TradiNonalcacheschurnwithnoactualbenefit• SomeimplementaNonscalltheinserNonofnewblockinthemiddlescanresistant
23
Insertnewblock
Churn
CACHE
HOT
COLD
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFlashCacheScanAwareness• ExadataSmartFlashCacheisscanresistant– Abilitytobringsubsetofthedataintocacheandnotchurn– OLTPandDWscanblockscanco-exist
• Nestedscansbringinrepeatedaccesses– Repeat,Foreachiteminlargetable,scansmalltable– Smartenoughtopullthesmalltableintoflashsinceitisaccessedrepeatedlyeventhoughthesizeoflargetablealoneislargerthanflashcache
• Noneedtoset“KEEP”agributeindatawarehouses• ScansautomaNcallyuseflashforextremeperformance
• Scanswon’tblowoutthecacheprovidingpredictableOLTPperformance
24
CACHE
HOT
COLD
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFileIniNalizaNon• CombinethebenefitsofSmartIniNalizaNonandWritebackFlashCache– WritefilecreaNonmeta-datatowritebackflashcache– TinyamountofflashspaceusedtocachelargeporNonsofiniNalizeddataondisk
– IniNalizaNonI/Ostodiskdeferredornotperformedifdataloaded
• Createtablespace,fileextensions,autoextendshowbenefit• RedologiniNalizaNonincludedinExadata12.1.1.1.0• FilecreaNonspedupbyover10x
25
Disks
Metadata
Metadata
Database
StorageCell
Flash
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartColumnarFlashCache
• HybridColumnarCompressionbalancesneedforOLTPandAnalyNcs
• AsCPUsgetfasterwantevenfasterscans• SmartFlashCacheautomaNcallytransformsblocksfromhybridcolumnartopurecolumnarforanalyNcsduringflashcachepopulaNon
• DualformatrepresentaNonforsinglerowlookups
• Onlyselectedcolumnsreadfromflashduringaquery
• Upto5xqueryspeedup
selectcolumnAfromtablewhere…
CompressionUnits Columns
FlashCachePopulaNon
26
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
SmartFlashCacheSpaceResourceManagement• FlashCacheisasharedresource• DatabaseasaServicecreatesneedforefficientresourcesharing
• Specifyminimum(flashCacheMin)andmaximum(flashCacheLimit)sizes,orfixedallocaNons(flashCacheSize),adatabasecanuseintheflashcache
ALTER IORMPLAN - dbplan=((name=sales, flashCacheSize=100G), -
(name=finance,flashCacheLimit=100G, flashCacheMin=20G), -
(name=schain, flashCacheSize=200G))
• Containerdatabaseresourcespecifiedatthestorage• Pluggabledatabasecontainerresourcelimitsexpressedaspercentagesinthecontainerdatabase
• DatabaseandPluggabledatabaseI/OresourcemanagementisuniquetoExadata
• Predictableperformancefordatabasequeries–nomorenoisyneighbor
FINANCE
SUPPLYCHAIN
SALES
27
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
Upcoming:InmemoryformatinColumnarFlashCache• In-MemoryformatsusedinSmartColumnarFlashCache
• Enablesvectorprocessingonstorageserverduringsmartscans– MulNplecolumnvaluesevaluatedinsingleinstrucNon
• FasterdecompressionspeedthanHybridColumnarCompression
• EnablesdicNonarylookupandavoidsprocessingunnecessaryrows• SmartScanresultssentbacktodatabaseinInMemoryColumnarformat– ReducesDatabasenodeCPUuNlizaNon
• In-memoryperformanceseamlesslyextendedfromDBnodeDRAMmemoryto10xcapacityflashinstorage– EvenbiggerdifferenNaNonagainstall-flasharraysandotherin-memorydatabases
28
In-MemoryColumnarscans
In-FlashColumnarscans
UpcomingreleaseofExadataSo6ware
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
Upcoming:SmartwriteburstsandtempIOinflashcache• Writethroughputoffourflashcardshasbecomegreaterthanthewritethroughputof12-disks
• Whendatabasewritethroughputexceedsthethroughputofdisks,smartflashcacheintelligentlycacheswrites– SchemachangesduringapplicaNonupgradesrewriteenNretablesinsome
packagedapplicaNons– LargedatabaseconsolidaNonscanhavewriteburstsatthesameNme
• WhenquerieswritealotoftempIOanditisbogleneckedondisk,smartflashcacheintelligentlycachestempIO– WritestoflashfortempspillreduceselapsedNme– ReadsfromflashfortempreduceselapsedNmefurther
• SmarttoprioriNzeOLTPdataanddoesnotremovehotOLTPlinesfromthecache
• Smartflashwearmanagementforlargewrites
• Muchfasterscansanddiskwrites
29
UpcomingreleaseofExadataSo6ware
WriteBurtsandTempIOin
FlashCache
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
Preview:Non-volaNleMemoryTierinExadataStorage
• ExadataStorageServerswilladdanon-volaNlememory(NVRAM)cacheinfrontofFlashmemory– SimilartocurrentFlashcacheinfrontofdisk– RDMAdirectaccesstoNVRAMgives20xlowerlatencythanFlash
• NVRAMusedasacacheeffecNvelyincreasesitscapacityby10x
• ExpensiveNVRAMsharedacrossserversforlowercost
• NVRAMmirroredacrossstorageserversforfault-tolerance
30
ComputeServer
StorageServer
Hot
Warm
RDMA
Cold
NVRAM
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFlashBenefits• SmartFlashCacheisdatabaseaware• SmartFlashLoggingavoidsredologoutliers
• SmartFlashCacheScanprovidessubsetscanningandistablescanresistant
• SmartFileIniNalizaNoncreatesafilebywriNngmeta-datatoflashcache
• SmartColumnarFlashCacheextendscolumnarbenefittostorage
• SmartFlashCacheSpaceResourceManagementprovidesgranularcontrol
• Upcoming:SmartFlashcachewithinmemoryformatsenablesmassivecapacityforvectorprocessing
• Upcoming:SmartwriteburstandtempIOinFlashCache
31
Copyright©2016,Oracleand/oritsaffiliates.Allrightsreserved.| 32