+ All Categories
Home > Documents > Ingestion, Indexing and Retrieval of High-Velocity...

Ingestion, Indexing and Retrieval of High-Velocity...

Date post: 28-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
33
Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor Data on a Single Node Juan A. Colmenares, Reza Dorrigiv and Daniel Waddington <[email protected]> Seminar Series Department of Computer Science University of California, Irvine January 12, 2018 Samsung Research America
Transcript
Page 1: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

Ingestion,IndexingandRetrievalofHigh-VelocityMultidimensionalSensor

DataonaSingleNodeJuanA.Colmenares,RezaDorrigiv andDanielWaddington

<[email protected]>SeminarSeries

DepartmentofComputerScienceUniversityofCalifornia,Irvine

January12,2018

SamsungResearchAmerica

Page 2: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

Disclaimer

• Nopartofthispresentationnecessarilyrepresentstheviewsandopinionsofmycurrentemployerandresearchcollaborators.

• Thismaterialwaspresentedatthe2017IEEEInt’lConferenceonBigData(IEEEBigData).

Page 3: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

MultidimensionalDataSourcesMobileDevices

Vehicles

DataCenters PowerGrid

SmartAppliances

Page 4: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

MultidimensionalData

• Spatial-temporaldata– [time,longitude,latitude,speed,…]

• Sensordata– [time,voltage,current,temp,…]

• Logs– [time,responselatency,resultcount,…]

id:28379,time:2015/12/04-11:52:21.134,latitude:40.77,longitude:-73.89,occupants:3,speed:43.2mph

NYCTaxiData

Record:[f1,f2,f3,f4,…,f(N-1),f(N) ](withnumericalindexingfields)

A

Page 5: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

DemandsforHighIngestionRatesinIndustrialIoT

• SomeIoT appsrequire– Ingestingmillionsofrecs/sec– Processingqueriesonrecentlyingestedandhistoricaldata

• Example– Telemetryofpowerdistributionsystemswithmicrophasormeasurementunits(μPMU)[1,2,3]

[1] UCBerkeley,LBNLetal.http://pqubepmu.com/[2] Pinte etal.Lowvoltagemicro-phasormeasurementunit(μPMU).PECI2015.[3] Andersenetal.DISTIL:Designandimplementationofascalablesynchrophasor dataprocessingsystem.IEEESmartGridComm 2015.

512+samples/secofACvoltagesandcurrents,andothersvariables

Page 6: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

NewDBMSstoMeetHighIngestionRequirements

• TraditionalDBMS– Optimizedforread-heavyworkloads– Offerverylowingestrates(<300Krecs/s)

• Newtimeseriesdatabases– Gorilla[VLDB2015],BTrDB [FAST2016]

• NewOLAPsystems– Druid[SIGMOD2014],VOLAP[CLUSTER2016],Cubrik [VLDB2016]

• Scalehorizontally• Sub-secondqueryresponses• Someoperatein-memory

• Lowper-nodeingestionrates

Page 7: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

KeyQuestion

• Toanswerit:– Adoptasimpledesigntostreamlineingestion– Conductaexperimentalstudyconfinedto• Recordswithnumericalindexingfields• Rangequeries

Canwebuildasingle-nodemultidimensionaldatastore ableto:(1) sustainingestionratesmuchhigherthanthoseof

individualnodesofexistingDBMS(2) whilestillofferingsimilarqueryperformance?

Page 8: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

SeparateNodesforIngestionandStorage

DINode(1) DINode(N)

DSNode(1) DSNode(M)DataStorageNodes

DataIngestionNodes

DataStream

PermanentStorage

SimilartoDruid’sdesign[SIGMOD2014]

InterimStorage

Queries

QueryBrokerNode

Page 9: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

R-Tree

FamilyVariants:R*-tree,R+-tree,HilbertR-tree,X-tree

Source:Wikipedia

Page 10: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

K-dimensional(kd)Tree

K-dtreedecompositionforthepointset(2,3),(5,4),(9,6),(4,7),(8,1),(7,2)

Source:wikipedia

Page 11: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

Two-LevelIndexingScheme

1. AnR*-Treeindexesdatasegments(boundingboxes)

2. AKD-Treeineachsegmentindexesindividualrecords

SerializedDataSegments(withtherecords)

R*-Tree,inmemory(Level1)

KD-Tree(Level2)

SimilartoEMINC[CloudDB 2009]

BoundingBox ={d1,min,d1,max,…,d3,min,d3,max }

Page 12: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

Two-LevelIndexingScheme

DataSegments(withtherecords)

KD-Tree(Level2)

RangeQuery

1

2

3

R*-Tree,inmemory(Level1)

BoundingBox ={d1,min,d1,max,…,d3,min,d3,max }

Page 13: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

DataSegment

PackedKD-Tree(Serialized)

DataRecords

Page 14: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

RecordDescriptor

Page 15: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

DataIngestionProcedure

• Steps1– 5areperformedonlyinmemory

Page 16: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

Multi-DimensionalDatastore (MDDS)

μ-batches

ThreadParallelism(Chunksprocessedindependently)

Dataaccessibletoqueriesfrommemory

beforebecomingpersistent

ConcurrentQueries(whileingestingdata)

Exploitdatalocality

Page 17: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

EvaluationSystems Datasets Queries• Percona Server(enhancedMySQL)withstorageenginesXtraDB,MyISAM,andTokuDB• SQLite3• Druid [SIGMOD2014]

NYCTaxiTrip• ~169Mrecords• 10 numericalfields(outof14)

16 randomlygeneratedquerieson1kmX1kmareas

USNOAA’sGlobalHistoricalClimatologyNetwork- Daily (GHCN-Daily)• First100Mrecords• 6 numericalfields(outof7)

10meaningfulhandcraftedqueries(e.g.,theaveragesnowdepthforMountMcKinleyinAlaska)

Test Platform: Dell PowerEdge R720 Server • Two 2.50-GHz Intel Xeon processors (20 hardware threads), 64GB of RAM, and an

Intel 750 400GB SSD with ext4 file system. • Ubuntu 14.04 LTS (Linux kernel 3.13.0-71).

Details of experimental setup at: https://arxiv.org/abs/1707.00825

Page 18: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

TestQueriesonNYCTaxiDataRandomlyGenerated

Page 19: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

TestQueriesonGHCN-DailyDataMeaningfulHandcrafted

Page 20: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

CharacterizationofDataSegmentationSchemes

• UniformlyRandomScheme(verysimple)– Recordsassignedtodatasegmentschosenuniformlyatrandom

• Kd-treepartitioningbasedscheme– Triestocreatewell-populatedsegmentswithsmalloverlapamongtheirboundinghyperrectangles

– Ourhypothesis• Itlimitsreadamplificationandimprovesqueryperformance(butnotquiteL!)

Page 21: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

K-dimensional(kd)Tree

K-dtreedecompositionforthepointset(2,3),(5,4),(9,6),(4,7),(8,1),(7,2)

Source:wikipedia

Page 22: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

KD-treePartitioningBasedScheme

…..

ChunkofRecords(μ-batch)

(1)BulkLoading

KD-Tree

(2)KD-TreePartitioning

(3)Assembly(Serialization)

PartitionedKD-Tree

(2)KD-TreePartitioning• Traversesthetreeindepth-firstpre-order,groupingtherecordsbasedonthenumberofnodesinthesubtrees(withenoughrecords)

DataSegment

Page 23: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

ComparisonbetweenSegmentationSchemesIngestionThroughput

Page 24: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

ComparisonbetweenSegmentationSchemesNumberofOverlapsamongSegments

Lessoverlapsforkd-treepart.,

exceptinthiscase

Page 25: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

ComparisonbetweenSegmentationSchemesQueryPerformance

Couldn’tvalidateourhypothesisThekd-treepartitioningschemedoesnotyieldbetterqueryperformance

Page 26: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

ComparisonbetweenSegmentationSchemesQueryPerformance

Couldn’tvalidateourhypothesisThekd-treepartitioningschemedoesnotyieldbetterqueryperformance

Page 27: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

Single-Threaded/Bulkloading Ingestion

11xw/binarydata

2x w/CSVdata

Page 28: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

IngestionThreadScalingandInfluenceofQueries

Percona Server,SQLite&Druidreported35K,30K,and55Krecs/s,respectively.

230x inthemultithreadedscenario160x overall27x w/CSVdata

-20%

Page 29: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

QueryResponseTimesforNYCTaxiData• Querieson1km2 areaswithrangesintime,tripdurationandpassengercount.• Percona ServerandSQLitewithasinglemulticolumnindex.

• MDDSperformscomparablytoorbetterthanPercona Serverin12queries.• ItoutperformsSQLiteinQ7andQ14.• ItoutperformsDruidinQ6-Q16(on3- to5-dimensionalranges).

Page 30: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

QueryResponseTimesforGHCN-DailyData• 10meaningfulqueries(e.g.,averagesnowdepthforMt.McKinleyinAlaska).• Percona ServerandSQLitewithmultipleindicestailoredtothequeries.

• Asexpected,RDBMSsoutperformsMDDSacrossallqueries.• MDDSoutperformsDruidinhalfofthequeries(with3+dimensionalranges)

Page 31: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

StorageFootprint(inGB)

MDDSoccupies• 20-42%lessstoragespacethantheRDBMSs• Upto2xthespaceusedbyDruid(w/heavydata

compression)

Page 32: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

Conclusions

• Developedamultidimensionaldatastore ableto– Ingesthigh-velocitysensordata– Offerdecentqueryperformance

• Showedpotentialforsignificantreductionsinthenumberofclusternodesrequiredtoingesthigh-velocitysensordata

• Comparedarandomschemeandakd-treepartitioningbasedschemefordatasegmentation– Kd-treepartitioningschemeproducedlessoverlapbetweendatasegments,butdidnotyieldbetterqueryperformance

– Therandomschemeisverysimpleandfaster• Ourfirstchoice

Page 33: Ingestion, Indexing and Retrieval of High-Velocity ...juancol.me/rsrc/mdds-cs-uci-20180112.pdf2018/01/12  · Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor

Thanks

Questions?


Recommended