+ All Categories
Home > Documents > A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

Date post: 11-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
21
A Comparison of Two Physical Data Designs for Interactive Social Networking Actions Sumita Barahmand, Shahram Ghandeharizadeh, Jason Yap Database Laboratory Technical Report 2012-08 Computer Science Department, USC Los Angeles, California 90089-0781 barahman,shahram,jyap @usc.edu October 7, 2013 Abstract This paper compares the performance of an SQL solution that implements a relational data model with a document store named MongoDB. We report on the performance of a single node configuration of each data store and assume the database is small enough to fit in main memory. We analyze utilization of the CPU cores and the network bandwidth to compare the two data stores. Our key findings are as follows. First, for those social networking actions that read and write a small amount of data, the join operator of the SQL solution is not slower than the JSON representation of MongoDB. Second, with a mix of actions, the SQL solution provides either the same performance as MongoDB or outperforms it by 20%. Third, a middle-tier cache enhances the performance of both data stores as query result look up is significantly faster than query processing with either system. A Introduction There is an abundance of data stores with both the computer industry and the research arena contributing novel ar- chitectures and data models. In [10], Cattell surveys and classifies 22 data stores to motivate a quantitative analysis of the alternative designs and implementations. We study a specific aspect of this vast multi-faceted topic, namely, a comparison of an industrial strength relational database management system (RDBMS) named 1 SQL-X and a NoSQL document store named MongoDB. While SQL-X implements a relational data model [12], MongoDB implements a A shorter version of this paper appeared in the ACM International Conference on Information and Knowledge Management (CIKM), San Francisco, CA, Oct 2013. 1 Due to licensing agreement, we cannot disclose the identity of this system. 1
Transcript
Page 1: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

A Comparisonof Two PhysicalDataDesignsfor InteractiveSocial

NetworkingActions�

SumitaBarahmand,ShahramGhandeharizadeh,JasonYap

DatabaseLaboratoryTechnicalReport2012-08

ComputerScienceDepartment,USC

Los Angeles,California90089-0781

�barahman,shahram,jyap� @usc.edu

October7, 2013

Abstract

Thispapercomparestheperformanceof anSQLsolutionthatimplementsarelationaldatamodelwith adocument

storenamedMongoDB.Wereportontheperformanceof asinglenodeconfigurationof eachdatastoreandassumethe

databaseis smallenoughto fit in mainmemory. Weanalyzeutilization of theCPUcoresandthenetwork bandwidth

to comparethetwo datastores.Our key findingsareasfollows. First, for thosesocialnetworking actionsthat read

andwrite asmallamountof data,thejoin operatorof theSQLsolutionis not slower thantheJSONrepresentationof

MongoDB.Second,with a mix of actions,theSQL solutionprovideseitherthesameperformanceasMongoDBor

outperformsit by 20%. Third, a middle-tiercacheenhancestheperformanceof bothdatastoresasqueryresultlook

up is significantlyfasterthanqueryprocessingwith eithersystem.

A Introduction

Thereis an abundanceof datastoreswith both the computerindustryandthe researcharenacontributing novel ar-

chitecturesanddatamodels. In [10], Cattell surveys andclassifies22 datastoresto motivatea quantitative analysis

of thealternative designsandimplementations.We studya specificaspectof this vastmulti-facetedtopic, namely, a

comparisonof anindustrialstrengthrelationaldatabasemanagementsystem(RDBMS)named1 SQL-X andaNoSQL

documentstorenamedMongoDB.While SQL-X implementsa relationaldatamodel[12], MongoDBimplementsa�A shorterversionof this paperappearedin the ACM InternationalConferenceon InformationandKnowledgeManagement(CIKM), San

Francisco,CA, Oct2013.1Dueto licensingagreement,wecannotdisclosetheidentity of this system.

1

Page 2: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

JSONrepresentationof data[14]. Eachoffers a rich setof designchoices.We usethe BG [5] benchmarkto exer-

cisethedifferentcapabilitiesof eachdatastore.This socialnetworking benchmarkconsistsof a databaseandeleven

actions(seeTable1) thateitherreador write a smallamountof datafrom thedatabase.

While SQL-X doesnot scalehorizontally, MongoDBscalesto a largenumberof nodes.In additionto impacting

the performanceof a singlenodeinstanceof eachdatastore,physicalorganizationof dataimpactsthe horizontal

scalabilityof MongoDB.While both areimportant,we focuson the performanceof a singlenodeinstanceof each

datastorefor thefollowing reasons.First, it providesinsightsinto thetradeoffs associatedwith two alternative logical

datadesigns,namely, relationalandJSON.An interestingfinding is thattheuseof thejoin operatoris notslower than

theJSONrepresentation,seeSectionD.

Second,while BG’s interactive social networking actionsaresimple, they interactin complex ways to offer a

wide rangeof designchoices.We show it is beneficialto move the work of readactionsto write actionswhenthe

workloadis dominatedby readactions.(Accordingto Facebook,morethan99%of their workloadis dominatedby

queries[3, 28].) Materializedviewsarenot appropriatebecausethey provideeithera very low performanceor a high

amountof staledata,seeSectionE.

Members(������� � ��������������������������� !��"#���������%$&�'��"#���(�)��� *�+!,�� -'�����(��.� *�����"/�(�%$&����"/�(�%�0�(�(.�����(�1���)�� $���"/��$/����.+!��#$&��2(���(-'�(�"�3��4�5,�����#$&2(���(-'� )

Friends( 6 ��7� "/���2�8��96 ��7� "/����2�8 � 6��"/��"#��� )Resource(�#� � 6: .����"/+��#��� 6�;��$�$<�������#�=�1"#>��4���%,?+.�(>��1��+ : )Manipulation(�@#� ���)+.�(#��#���#����A�#��� 6: .����"/+��#����"# ���.��"/���B�C��"#>��4��� : +���"/����" )Figure1: BasicSQL-X databasedesign.Theunderlinedattribute(s)denotetheprimarykey of a table.Attributeswitha hatdenotetheindexedattributes.

Third, BG’s socialnetworking actionsimposea small amountof work on a nodeandshouldbe processedby a

singlenodeof amulti-nodedatastore.Otherwise,theoverheadof parallelismlimits thescalabilityof adatastore[19,

15, 33]. By understandingfactorsthat enhancethe performanceof a single node,we provide a solid foundation

to investigatealternative designsthat impact the scalabilityof a datastore. Thesealternativesincludepartitioning

strategies,replication,andsecondaryindexes[15, 29].

Our primarycontribution is a quantitativecomparisonof two differentdatastorearchitecturesto provide insights

into theirworking operations.Datastorearchitectsmayusetheseresultsto enhancetheperformanceof their existing

datastorefor socialnetworkingactions.Socialnetworking sitesmayusetheseresultsto fine tunetheperformanceof

their existing deployments.For example,whencomputingthefriendsof a member[3], obtainedresultssuggestjoin

of two tablesmightbefastenoughaslong asimagesarerepresentedeffectively, seeSectionC.

Related work: An experimentalcomparisonof RDBMSsolutionswith theNoSQLsystemsfor interactivedata-serving

environmentsanddecisionsupportworkloadsis presentedin [18]. Its key finding is that the SQL systemsprovide

significantperformanceadvantagesand that NoSQL systemsare fairly competitive in many cases.It employs the

2

Page 3: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

Figure2: BasicMongoDBdesignof BG’s database.

YCSB benchmark[13] for its evaluationof interactive environments. Our evaluationfocuseson interactive social

networking actionsandconsidersa richerconceptualdatamodelandworkload.We explorea subsetof a largespace

of possibilitiesincludingtheuseof caches,quantifyingtheir tradeoffs.

The rest of this paperis organizedas follows. SectionB providesan overview of the BG social networking

benchmark. It includesan organizationof data,termedBasic, with both SQL-X andMongoDB. SectionsC to F

presentphysicaldesignenhancementsto the Basicdesignof eachsystem. This discussionincludesa quantitative

analysisto identify the bestdesigndecisions(termedBoosted)for eachsystem. We usethesedesignsto compare

SQL-X with MongoDBin SectionG. Brief conclusionsandfutureresearchdirectionsarepresentedin SectionI.

B Overview of BG Benchmark

BG [5] is a benchmarkto ratedatastoresfor interactive socialnetworking actionsandsessions.Theseactionsand

sessionseitherreador write a very small amountof the entiredataset. BG usesa threadto emulatea memberof

a socialnetworking site viewing eitherherown profile or thatof anothermember, listing eitherher friendsor those

of anothermember, inviting anothermemberto befriends,viewing her top-k resources(image,posting),andothers.

Thefirst columnof Table1 lists all actionssupportedby BG. Theseactionsarecommonto sitessuchasFacebook,

LinkedIn,Twitter, FourSquare,andothers[5].

BG modelsa databaseconsistingof a fixed numberof members( D ) with a registeredprofile. Eachmember

profile may consistof eitherzeroor 2 images.With the latter, oneimageis a thumbnailandthe secondis a higher

resolutionimage. While thumbnailsaredisplayedwhenlisting friendsof a member, the higherresolutionimageis

displayedwhenamembervisitsauser’sprofile. An experimentstartswith afixednumberof friends( E ) andresources

per member( F ). This studyassumesa databaseof 10,000profileswith 2 KByte thumbnailimagesand12 KByte

3

Page 4: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

BG SocialActions TypeVeryLow Low High

(0.1%)Write (1%) Write (10%)WriteView Profile(VP) Read 40% 40% 35%List Friends(LF) Read 5% 5% 5%View FriendsRequests(VFR) Read 5% 5% 5%Invite Friend(IF) Write 0.02% 0.2% 2%AcceptFriendRequest(AFR) Write 0.02% 0.2% 2%RejectFriendRequest(RFR) Write 0.03% 0.3% 3%Thaw Friendship(TF) Write 0.03% 0.3% 3%View Top-K Resources(VTR) Read 40% 40% 35%View Commentson Resource(VCR) Read 9.9% 9% 10%PostCommenton a Resource(PCR) Write 0% 0% 0%DeleteCommentfrom a Resource(DCR) Write 0% 0% 0%

Table1: Threemixesof socialnetworkingactions.

profile images.We alsoconsiderdatabaseswith no images.All experimentsstartwith 100 friends2 andresources

per user, E = F =100. (The time to load the databasewith MongoDB is slightly fasterthan SQL-X [7].) We have

conductedexperimentswith a100Kmemberdatabase.Thereportedobservationsandtrendsdo notchangeaslongas

thebenchmarkdatabaseis smallerthantheservermemory.

BG computesa SocialAction Rating (SoAR) of a datastorebasedon a pre-specifiedservicelevel agreement

(SLA) by manipulatingthe numberof threads(i.e., emulatedmembers)thatperformactionssimultaneously. SoAR

is the maximumsystemthroughput(actionsper second)that satisfiesthe SLA. All SoAR ratingsin this paperare

establishedwith the following SLA: 95% of requestsobserve a responsetime of 100 millisecondsor fasterwith

unpredictable(stale)datalower than0.1%. An idealphysicaldatadesignis onethatmaximizesSoARof a system.

Datadesignsusingmaterializedviewsandcacheaugmenteddatastoresmayproducestaledata.Theformeris because

the RDBMS may propagateupdatesto the materializedview asynchronously. The latter is dueto write-write race

conditionsbetweenthedatastoreandthecache[20].

Figure 1 shows the relationaldesignof BG’s database.The underlinedattributesare the primary keys of the

identifiedtables.Index structuresareconstructedon theseattributesto facilitateefficient processingof readactions.

Forexample,with view profileactionreferencingamemberwith aspecificuserid,say5,ahashindex facilitateefficient

retrieval of the Membercorrespondingto this userid. MemberstablemaystoreimagesasBLOBs. Alternativesare

discussedin SectionC. Computingeither list of friendsor pendingfriends requiresa join betweenMembersand

Friendstable. SectionE exploresthe useof materializedviews and their alternativesto migratethe work of read

actionsto write actionsfor computingsimpleanalytics.We reportSoARof thesedesignswith SQL-X.

Figure2 shows the JSONdesignof BG’s databasetailoredfor usewith MongoDB. For eachmemberDHG , this

designmaintainsthreedifferentarrays:1) pendingFriendsmaintainsthe id of memberswho have extendeda friend

invitation to D G , 2) confirmedFriendsmaintainstheid of memberswho arefriendswith D G , and3) wallResourceIds

maintainsthe id of resources(e.g., images)postedon DIG ’s profile. Onemay storeprofile andthumbnailimageof2MedianFacebookfriendcountis 100[35, 4].

4

Page 5: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

eachmembereither in the file system,MongoDB’s GridFS,or as an array of bytes. Figure 2 shows the last two

choices.Whenimagesarestoredin theGridFS,theprofileimageidandthumbnailimageidarestoredasattributesof

the Memberscollection(insteadof the arrayof bytesshown in Figure2). SectionC shows onedesignprovidesa

SoARsignificantlyhigherthantheothertwo.

In the next 3 sections,we provide additionaldetailsaboutBG’s actionsandtheir implementationusingboth the

relationalandJSONrepresentations.We discusschangesto the physicalorganizationof dataand their impacton

theSoARof SQL-X andMongoDB.We analyzeSoARof SQL-X with differentmixesof actions,seeTable1. Post

CommentandDeleteCommentactionsareeliminatedbecausewehavenoimproveddesignsto offer for theseactions.

To simplify discussion,this paperclassifiesBG’s actionsinto thosethateitherreador write data.A readactionis

onethatqueriesdataandretrievesdataitemswithout updatingthem.A write actionis onethateitherinserts,deletes,

or updatesdataitems.Column2 of Table1 identifiesdifferentreadandwrite actions.

All reportedSoARnumbersarebasedon a dedicatedhardwareplatformconsistingof six PCsconnectedusinga

GigabitEthernetswitch. EachPCconsistsof a 64 bit 3.4 GHz Intel Corei7-2600processor(4 coreswith 8 threads)

configuredwith 16 GB of memory, 1.5 TB of storage,andone3 Gigabit/secondnetworking card. Onenodeactsas

our datastoreserver (eitherMongoDB or SQL-X) at all4 times. All othernodesareusedasBGClientsto generate

workloadfor this node.With all reportedSoARvaluesgreaterthanzero,eitherthedisk,all cores,or thenetworking

cardof theserverhostingadatastorebecomefully utilized. Wereportontheuseof two networkingcardsto eliminate

thenetwork asa limiting resource.WhenSoARis reportedaszero,thismeansa designfailedto satisfytheSLA.

C Manage Images Effectively

Thereis folklore thatanRDMBSefficiently handlesa largenumberof smallimages,while file systemsaremoreeffi-

cientfor storageandretrieval of largeimages[31]. With BG, we show physicalorganizationof profile andthumbnail

imagesin adatastoreimpactsits SoARratingdramatically. For example,if thumbnailimagesarenotstoredasa part

of theprofile structurerepresentinga memberthentheperformanceof thesystemfor processingtheList Friend(LF)

actionis degradedsignificantly. This holdstruewith bothMongoDBandSQL-X. Performanceof SQL-X is further

enhancedwhenprofile imagesarestoredin thefile system.Thesamedoesnot hold truewith MongoDB.Below, we

provideexperimentalresultsto demonstratetheseobservations.

TheLF actionof BG retrievesthe thumbnailimageandthe profile informationof eachfriend of a member(see

attributesshown in Figure1). Figure3 shows theSoARratingof LF with SQL-X andMongoDBwith 100friendsper

member. While SQL-X performsa join betweentwo tables(MembersandFriendsof Figure1) to performthisaction,

MongoDBlooksup anarrayof memberidentifiers(confirmedFriendsof Figure2 for the referencedMemberJSON

instance)andretrievestheJSONobjectfor eachmember. With SQL-X, we considerthumbnailsstoredin eitherthe3In someexperiments,the server hostingeitherSQL-X or MongoDB is configuredwith two networking cards,eachis a oneGigabit/second

card.4Thesamenodeis usedaseitherSQL-X or MongoDBserver in all experiments.

5

Page 6: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

0

100

200

300

400

SoAR (Actions/Second)

Inline Array of Bytes (296)

GridFS (0)

FS(0)

SQL−XMongoDB

Inline BLOB (395)

Figure3: SoARof List Friends(LF) with differentorganizationof 2 KB thumbnailimage,M=10K, E =100.

0

10,000

20,000

30,000

40,000

SoAR(Actions/Second)

2KB 12KB

FS7,695

BLOB 182

FS 12,300

BLOB37,902

Figure4: SoARof SQL-X for processingaworkloadconsistingof 100%View Profile(VP) actionwith imagesstoredaseitherBLOBsor in theFS, D =10K, E =100.

file systemor inline with therecordrepresentingthemember. With MongoDB,we considerthumbnailsstoredin its

Grid File System(GridFS)or asanarrayof bytesin theJSON-like representationof a member. With bothsystems,

storingthethumbnailimageasa partof theMemberprofile enhancesSoARratingof thesystemfrom zeroto a few

hundred.In theseexperiments,theCPUof thethedatastorebecomes100%utilized. Notethat,with asinglenode,the

join operationof SQL-X is not necessarilyslower thanMongoDB’s processingof confirmedFriendsarrayto retrieve

documentscorrespondingto thefriendsof themember.

Theperformanceof SQL-X for processingView Profile(VP) actionof BG is enhancedwhenlargeprofile images

arenot storedin the RDBMS, seeFigure4. An alternative is to storethemin thefile systemwith a memberrecord

maintainingthe nameof the file containingthe correspondingprofile image[31, 8]. Figure4 shows the SoAR of

SQL-X with thesetwo alternativesfor two differentimagesizes:2 KB and12KB. (As acomparison,with no images,

SoARof SQL-X is 119,746for this workload.) A small imagesize,2 KB, enablesSQL-X to storethe imageinline

with thememberrecord,outperformingthefile systemby a factorof 3. SQL-X storesimagesinline aslong asthey

aresmallerthan4 KB. Beyond this, for examplewith our assumed12 KB imagesizes,the performanceof SQL-X

diminishesdramatically, enablingthefile systemto outperformit by morethan40 folds.

MongoDB’sGridFSprovideseffectivesupportfor images.Its SoARis comparableto storingtheseimagesin the

file system.It outperformsthefile systemby morethana factorof two with very largeprofile images,e.g.,500KB.

It is worth notingthatSQL-X outperformsMongoDBwith imagesizessmallerthan4 KB by inlining themin profile

6

Page 7: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

SocialAction OneRecordperFriendship Two RecordsperFriendship

Member1’s SELECTcount(*)FROM Friends SELECTcount(*)FROM Friendsnumber WHERE(inviterID=1 or inviteeID=1) WHEREinviterID=1 andstatus=‘C’of friends andstatus=‘C’Member SELECTm.* FROM Membersm, Friendsf SELECTm.* FROM Membersm, Friendsf1’s WHERE((f.inviterID=1andm.userid=f.inviteeID)or WHEREf.inviterID=1 andlist of (f.inviteeID=1andm.userid=f.inviterID)) f.status=‘C’andfriends andf.status=‘C’ m.userid=f.inviteeIDMember1invites INSERT INTO Friendsvalues(1, 2, ’P’)Member2Member2 UPDATE friendship 1. UPDATE friendshipSETstatus=‘C’accepts SETstatus=‘C’ WHEREinviterID=1 andinviteeID=2Member1’s WHEREinviterID=1 andinviteeID=2 2. INSERT into friendship(inviteeID,invitation inviterID, status)values(1, 2, ‘C’)Member2rejects DELETEFROM FriendsMember1’s WHEREinviterID=1 andinviteeID=2andstatus=‘P’InvitationMember1thaws DELETEfriendship FROM Friendswith WHERE((inviterID=1 andinviteeID=2)or (inviterID=2 andinviteeID=1))andstatus=‘C’Member2

Table2: Onerecordandtwo recordrepresentationof a friendshipwith onetable,Friendstableof Figure1.

records.Beyondthis limit, MongoDBoutperformsSQL-X.Similar to thethumbnaildiscussions,if profile imagesizes

areknown to besmall in advancethenonemayinline themwith MongoDBby representingthemasanarrayof bytes

in theMemberscollection,seeFigure9. Key considerationsincludeMongoDB’s limit of 16 Megabytesfor thesize

of a documentandtheimpactof largedocumentson actionsthatdo not requiretheretrieval of theprofile image.For

example,the List Friend(LF) actiondoesnot requirethe profile image. MongoDB providesan interfaceto remove

someattributevaluesof a documentwhile constructinga query. For example,onemayquerytheMemberscollection

for a documentwith userid1 andnot retrieve theprofile imageof thequalifying documentby issuingthe following

expression:db.member.find( J ”userid”:1,”profileimage”:f alseK ).

D Friendship

Theconceptof friendshipbetweentwo membersis centralto a socialnetworking site. Most of BG’s actionsmodel

this concept,seeTable2. An importantconsiderationis how to representthethumbnailimageof eachmemberlisted

asa friend of a referencedmember. This wasdiscussedin SectionC. Hence,this sectionfocuseson a BG database

configuredwith no images.

7

Page 8: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

SocialAction OneRecordperFriendship Two RecordsperFriendships

Member1’s SELECTcount(*) SELECTcount(*)number FROM Frds FROM Frdsof friends WHEREfrdID1=1 or frdID2=1 WHEREfrdID1=1Member SELECTm.* FROM Membersm, Frdsf SELECTm.*1’s WHERE((f.frdID1=1 andm.userid=f.frdID2) FROM Membersm, Frdsflist of or WHEREf.frdID1=1 andm.userid=f.frdID2friends (f.frdID2=1 andm.userid=f.inviterID))Member1invites INSERT INTO PdgFrdsvalues(1, 2)Member2Member2 1. DELETE FROM PdgFrdsWHERE 1. DELETE FROM PdgFrdsWHEREaccepts inviterID=1 andinviteeID=2 inviterID=1 andinviteeID=2Member1’s 2. INSERT into Frds(frdID1, frdID2) 2. INSERT into Frds(frdID1, frdID2)invitation values(1, 2) values(1, 2), (2, 1)Member2Rejects DELETE FROM PdgFrdsMember1’s WHEREinviterID=1 andinviteeID=2InvitationMember1thawsfriendship DELETE FROM FrdsWHEREwith (frdID1=1 andfrdID2=2) or (frdID1=2 andfrdID2=1)Member2

Table3: Onerecordandtwo recordrepresentationof afriendshipwith two tables,FrdsandPdgFrdstablesof Figure8.

8

Page 9: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

D.1 Relational Design: A Tale of One or Two

With a relationaldesign,onemay representpendingandconfirmedfriendshipsas eitheroneor two tables. With

eachalternative, a friendshipmight be representedaseitheroneor two rows. We elaborateon thesedesignsbelow.

Subsequently, we establishtheir SoARrating.Obtainedresultsshow thata two tabledesignis superiorto a onetable

design.

Figure1 shows a designthatemploys onetable. It employs anattributenamed“status” to differentiatebetween

pendingandconfirmedfriendships:A ‘C’ valuedenotesa confirmedfriendshipwhile a ‘P’ valuedenotesa pending

friendship.Thesecondcolumnof Table2 shows theSQL commandsissuedto implementthealternative BG actions

with this design.Notetheuseof disjuncts(“or”) in thequalificationlist of theSQL queries.A designermaysimplify

thesequeriesandeliminatedisjunctsby representinga friendshipwith two records.Theresultingqueriesareshown

in the third column of Table 2. The designchangesthe implementationof the Accept FriendshipRequestaction

(fourth row of Table2) into a transactionconsistingof two SQL statements.In our implementation,all transactions

areimplementedasstoredproceduresin SQL-X.

An alternativeto theonetabledesignis to employ two differenttablesandseparatependingfriendinvitationsfrom

confirmedinvitations,seephysicaldesignof Figure8 andqueriesof Table3. This eliminatesthe“status”attributeof

theonetabledesign.However, thedatadesigneris still facedwith thedilemmato representa friendshipeitherasone

row or two rows in thetablecorrespondingto theconfirmedfriends. Thesecondandthird columnof Table3 shows

theSQL commandswith thesetwo possibilities.A key differenceis thatSQL queriesaresimplerwith thetwo record

design.

Whencomparingthe alternative designs,the two recorddesignrequiresmorestoragespacethanthe onerecord

design. However, its resultingSQL queriesaresimplerto authorandreasonabout. With oneuserissuingrequests

(single threadedBG), the larger numberof recordsdoesnot impact the servicetime of issuedqueriesandupdate

commandsbecauseindex structuresfacilitateretrievalandmanipulationof therelevantrecords.In amulti-usersetting

with a mix of readandwrite actions,seeTable1, the two tabledesignoutperformsthe onetabledesignwhenthe

frequency of write actionis high enoughto result in conflicts. Figure5 shows SoAR of thesetwo alternativeswith

eachfriendshiprepresentedastwo records.ObservedSoAR with a mix of very low (0.1%)write actionsis almost

identical for the two designsdueto the useof index structuresanda low conflict rate. With a mix of high (10%)

write actions,the two tabledesignoutperformstheonetabledesignby morethan30%. We speculatethis is dueto

ACID propertyof transactionsslowing downtheonetabledesignasit is usedconcurrentlyto processbothpendingand

confirmedfriendshiptransactions.Thetwo tabledesignreducesthiscontentionamongconcurrentlyexecutingactions.

For example,thequeryto computethenumberof pendingfriend invitationsfor amemberis no longerblockedby the

transactionthatthaws friendshipbetweentwo members.

9

Page 10: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

0

5,000

10,000

15,000

20,000

1 Table(22,781)

2 Tables(22,830)

1 Table(13,424)

2 Tables(17,887)

0.1% Write 10% Write

SoAR(Actions/Second)

Figure5: SoARof SQL-X with eitheroneor two tablesfor pendingandconfirmedfriendshipswith two workloads,D =10K and E =100.Eachfriendshipis representedastwo records.

D.2 MongoDB: List Friends

With MongoDB,BG’sList Friend(LF) actionis mostinterestingbecauseit mustretrievethedocumentspertainingto

the friendsof a referencedmember. Thesecanberetrievedeitheronedocumentat a time or all documentsat once.

With the former, LF is implementedby issuinga queryto retrieve the basicprofile informationfor eachconfirmed

friend. With thelatter, theentirelist of friendsis usedwith the$in operatorto constructthequeryissuedto MongoDB.

This operatorselectsall thedocumentswhoseidentifiersmatchthevaluesprovidedin thelist. With anunderutilized

system(a few BG threads),the secondapproachprovidesa responsethat is approximately1.5 timesfasterthanthe

first. This is becausethe first approachincursthe overheadof issuingmultiple queriesacrossthe network for each

document.TheSoARof thesetwo alternativesis almostidenticalbecausetheCPUof theserver hostingMongoDB

becomes100%utilized.

MongoDB supportsa hostof write concerns,see[27] for details. We investigatetwo, termednormal andsafe

in MongoDB’s documentation.Both areimplementedby MongoDB’s java client. Thenormalwrite concernreturns

controloncethewrite is issuedto thedriver of theclient. Thesafewrite concernreturnscontrolonceit receivesan

acknowledgmentfrom theserver. With alow systemload(BG with onethread),thenormalwrite concernimprovesthe

averageresponsetimeof MongoDBby 13%.It doesnot,however, improvetheprocessingcapabilityof theMongoDB

server andhasno impacton its SoARwhencomparedwith thesafewrite concern.Moreover, it produceda very low

( L 0.1%)amountof unpredictablereads.

E Migrate Work of Reads to Writes

Due to a high readto write ratio of the workload of social networking sites[28], one may enhancethe average

servicetime of the systemby migratingthe workloadof readsto writes. With RDBMSs,oneway to realizethis is

by usingmaterializedviews, MVs. SectionE.1 discussesthis approachandshows that it slows down write actions

sodramaticallythat it is difficult to arguethey areinteractive. It presentsanalternative namedManual thatdoesnot

suffer this limitation. However, Manualrequiresadditionalsoftwareandincursthe overheadof a developmentlife

10

Page 11: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

0

5,000

10,000

15,000

20,000

25,000

0.1% Writes 10% Writes

MV Asynch(15,630)

MV Asynch(20,092)

MV Synch (665)

MVSynch (0)

Manual(23,733)

Manual(16,221)

Basic(13,224)

Basic(22,781)

SoAR (Actions/Second)

Figure6: SoARwith theBasicSQL-X designof Figure1, materializedviews (MV) for aggregatesasattributeswithboth synchronousandasynchronousmodeof refresh,anddevelopermaintained(Manual) aggregatesasattributes,D =10K, E =100,BG databasehasno images.

7.a)Client-Server (CS)architecture 7.b)SharedAddressSpace(SAS)architecture

Figure7: AlternativecacheaugmentedSQLarchitectures.

cycle.

E.1 Read Mostly Aggregates as Attributes

Socialnetworking sitespresenttheir memberswith individualized“small analytics” [32]. Theseareaggregatein-

formationsuchasa member’s numberof friends. BG modelstheseusingits View Profile (VP) actionthatprovides

eachmemberwith hercountof resources,friends,andpendingfriend invitations. Onemay implementthesein two

ways: 1) Computetheaggregateseachtime theVP actionis invoked,2) Storethevalueof aggregates,look themup

to processVP, andmaintainthemupto datein thepresenceof write actionsthatimpacttheir value.An exampleSQL

querythat implementsthe former is illustratedin the first row of Table2. The latter migratesthe workloadof read

actionsto write actions. It is appropriatewhenwrite actionsare infrequent. Below, we presenttwo alternativesto

implementthesecondapproach.

Onemay useMaterializedViews (MVs) of SQL-X to storethe valueof BG’s simpleanalyticsandrequirethe

RDBMS to maintaintheir valueup to date. This was implementedas follows. First, we defineoneMV for each

11

Page 12: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

aggregateof the VP action. The resulting3 views have two columns: user-id andthe correspondingaggregateat-

tribute value. Next, we authora MV that joins thesethreeviews with the original Membertable(usingthe user-id

attributevalue),implementingatablethatconsistsof eachmember’sattributesalongwith 3 additionalattributevalues

representingeachaggregatefor thatmember. This tableis queriedby theVP actionto look up thevalueof its simple

analyticinsteadof computingit.

Onemay configureSQL-X to refreshMVs eithersynchronouslyor asynchronouslyin the presenceof updates.

The asynchronousrefreshis in the order of hours,causingthe MV to containstaledata. BG quantifiestheseas

unpredictable reads.Below, we discussthis in combinationwith theobservedSoAR.

With no profile imageanda readworkloadthat invokestheVP actiononly, theauthoredMV improvesSoARof

SQL-X morethansix folds from 19,020to 119,746actionspersecond.With infrequent(0.1%)writes,asynchronous

modeof processingupdatesenablesMVs to enhanceSoARof SQL-X by almosta factorof two, seeFigure6. How-

ever, thiscauses31%of readactionsto observeunpredictable(stale)data.Theamountof unpredictabledataincreases

to 72%with ahigh frequency (10%)of write actions,enhancingSoARof SQL-X by amodest11%.

Thesynchronousrefreshmodeof MVs eliminatesunpredictabledata.However, asshown in Figure6, it diminishes

SoAR of SQL-X dramatically. This is becauseit slows down write actions.As an example,the servicetime of the

Accept Friend Requestwrite action is slowed down from 1.7 millisecondto5 1.94 secondswith an under-utilized

system,i.e.,oneBG thread.Theseservicetimesarenot interactive,renderingMVs inappropriatefor BG’sworkload.

An alternative to MVs, namedManual, is for a softwaredeveloperto implementaggregatesasattributesby ex-

tendingtheMembertablewith 3 additionalcolumns,onefor eachaggregate.Whenamemberregistersaprofile,these

attributevaluesareinitialized to zero.Thedeveloperauthorsadditionalsoftware(eitherin theapplicationsoftwareor

in theRDBMSin theform of storedproceduresandtriggers)for thewrite actionsthatimpacttheseattributevaluesto

updatethemby eitherincrementingor decrementingtheir valuesby one.For example,thedeveloperextendsa write

actionthat invitesMember1 to befriendswith Member2 to incrementthenumberof pendingfriendsfor Member1

by oneasa partof thetransactionthatupdatestheFriendstable,seeSectionD.

Manualspeedsup the VP actionby transforming4 SQL queriesinto one. The four queriesincluderetrieval of

the referencedmember’s profile attribute values,countof friends,countof pendingfriend invitations,andcountof

resources.In ourexperiments,ManualenhancedSoARof SQL-X for processingtheVP actionby thesameamountas

MVs with asynchronousupdate.However, it producesno stalereads.Whencomparedwith Basic,Manualprovides

at mosta 22%improvementastheVP actionconstitutes35%to 40%of theworkload,seeTable1.

A drawback of Manual is the additionalsoftware and its associatedsoftware developmentlife cycle (design,

implementation,testinganddebugging,andmaintenance).Its key advantagesincludeinteractive responsetimesfor

boththereadandwrite actionswith nounpredictablereads.5A 1,141fold slow down.

12

Page 13: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

F Cache Augmented Database Management Systems, CADBMS

With bothMongoDBandSQL-X, adevelopermayavoid issuingaqueryto thedatastoreby cachingits output,value,

givenits uniqueinput,key. This is themainmotivationfor middletier caches[23, 11, 36, 17, 16, 25, 1, 2, 30, 22, 28,

21]. This sectionfocuseson a specificsubclassthatemploys in-memoryKey-ValueStores(KVS) with a simpleput,

get,deleteinterface[21, 28]. Its usecaseis asfollows. Thedevelopermodifieseachreadactionto convert its input

to a key andusethis key to look up theKVS for a value.If theKVS returnsa valuethenthevalueis producedasthe

outputof theactionwithout executingthemainbodyof thereadactionthat issuesdatastorequeries.Otherwise,the

bodyof the readactionexecutes,issuesdatastorequeriesto computea value(i.e., outputof thereadaction),stores

theresultingkey-valuepair in theKVS for futureuse,andreturnstheoutputto BG.

Thedevelopermustmodify eachwrite actionto invalidatekey-valuepairsthatareimpactedby its insert,delete,

updatecommandto the datastore. For example, the write action that enablesMember1 to acceptMember2’s

friendshiprequestmustinvalidate5 key-valuepairs.Thesecorrespondto Member1’sprofile, list of friendsandlist of

pendingfriends,andMember2’sprofileandlist of friends.

The maximumnumberof uniquekey-valuepairs is a function of the numberof members/resourcesand read

actions. With a databaseof 10,000members,the view profile action of BG may populatethe KVS with 10,000

uniquekey-valuepairs. With View Commenton Resource(VCR) actionand100 resourcesper member, the KVS

mayconsistof a million uniquekey-valuepairs.Theactualnumberof cachedkey-valuepairsmight belower dueto

a skewedpatternof dataaccess,e.g.,aworkloadthatemploysa Zipfian distribution [6] to referencedataitems.

Thereare two categoriesof in-memoryKVSs: Client-Server (CS) andSharedAddressSpace(SAS) [21], see

Figure7. With CS,theapplicationserver communicateswith thecachevia messagepassing.A popularCSKVS is

memcached[26, 28]. With SAS,theKVS runsin theaddressspaceof theapplication.ExamplesincludeTerracotta’s

Ehcache[34] andJBossCache[9]. SASKVSsimplementtheconceptof atransactionto atomicallyupdateall replicas

of a key-valuein differentapplicationinstances.BothCSandSASarchitecturesmaysupportreplicationof key-value

pairs and implementconsistenthashingto enhanceavailability of dataand implementelasticity. A discussionof

thesetopicsis a digressionfrom our focus. Instead,we focuson the performanceof a singlecacheinstance.With

memcached,the cacheserver is a processhostedon a different server than the one hostingthe datastore. With

Ehcache,thecacheinstanceexecutesin theaddressspaceof theBGClient.

In thefollowing,wefocusontheimpactof theKVS with avery low (0.1%)andahigh(10%)frequency of writes.

With theseworkloads,both MongoDB andSQL-X provide comparableSoARsas either the CPU or the network

bandwidthof the server hostingthe KVS becomes100% utilized. Hence,without loss of generality, we present

SoARsobservedwith SQL-X usingeithermemcachedor Ehcache.

Table5 presentsSoAR of the alternative designswhen the databaseis configuredwith eitherno imagesor 12

KB profile imagesizeswith two differentmixesof workloads. Theseresultsshow Ehcacheprovides the highest

SoAR,outperformingmemcachedby morethana factorof 13 (5) with images(no images).This is becauseit runs

13

Page 14: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

Members(������� � ��������������������������� !��"#���������%$&�'��"#���(�)��� *�+!,�� -'�����(��.� *�����"/�(�%$&����"/�(�%�0�(�(.�����(�1���)�� $���"/��$/�1"�3=���5,����� $&2M�)�(-�� )

Frds( 6��.��2(85NM� 6��.�(2�8PO )

PdgFrds( 6 ��7� "/��.2�8@�Q6 ��7( "/����2(8 )

Resource(�#� � 6: .����"/+��#��� 6�;��$�$<�������#�=�1"#>��4���%,?+.�(>��1��+ : )Manipulation(�@#� ���)+.�(#��#���#��� A�#���R6: .����"/+��#����"# ���.��"/���B�C��"#>��4��� : +���"/����" )Figure8: BoostedSQL-X databasedesignwith profile imagesstoredin thefile systemandthumbnailimagesasinlineblobs.Onerecordin theFrdstablerepresentsthefriendshipbetweentwo members.Theunderlinedattribute(s)denotetheprimarykey. Attributeswith a hatdenotetheindexedattributes.

in thesameaddressspaceastheBGClient,avoiding theoverheadof transmittingkey-valuepairsacrossthenetwork

anddeserializingthem. In theseexperiments,the four coreCPU of the server hostingBGClient (andthe Ehcache)

becomes100%utilized,dictatingtheoverallsystemperformance.(Thisbottleneckexplainswhy thereis nodifference

betweenSQL-X andMongoDBonceextendedwith Ehcache.)It is interestingto notethattheSoARof Ehcachewith

12 KB imagesis almosttwice lower than that with no images. This is dueto network transmissionof imagesfor

invalidatedkey-valuepairs,increasingnetwork utilization from 30%to 88%.

With memcached,the four coreCPU of its server becomes100%utilized whenthereareno images,dictating

its SoAR rating. With 12 KB profile images,the network bandwidthbecomes100% utilized dictating SoAR of

memcached.In theseexperiments,memcachedcouldproducekey-valuepairsat a rateof up to 2 Gbpsasits server

wasconfiguredwith two Gbpsnetworkingcards.

G A Comparison

Table5 showsSoARof theBasicSQL-X andMongoDBdatadesignswhencomparedwith theirBoostedalternatives.

(SeeFigures1 and8 (2 and9) for theBasicandBoostedSQL-X (MongoDB)datadesigns.)Boostedincorporatesall

of thebestpracticespresentedin theprevioussectionsexceptfor theuseof caches6. With bothSQL-X andMongoDB,

theBasicdatadesignis inferior to theBoostedalternativebecauseit is inefficientandutilizesits 4 coreCPUfully.

With Boostedandno images,the CPU of the server hostingthe datastorebecomes100%utilized, dictatingits

SoAR.This is truewith bothSQL-X andMongoDBandthetwo workloads,0.1%and10%frequency of writes.These

resultssuggestSQL-X processesBG’sworkloadmoreefficiently thanMongoDBbecauseits SoARratingis two folds

higher.

With 12 KB profile images,both SQL-X andMongoDB continueto utilize their CPU fully with the Basicdata

design. With Boosted,the network becomes100%utilized anddictatestheir SoAR rating. Theseresultssuggest

SQL-X transmitslessdatathanMongoDBto processBG’s workloadbecauseits SoARratingis 50%higher.6ThepresentedSoARfor memcachedandEhcacheusetheBoosteddatadesign.

14

Page 15: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

Figure9: BoostedMongoDBdesignof BG’sdatabase.

We have conductedexperimentswith a 100K memberdatabase.The reportedtrendsandobservationshold true

for thisandotherdatabasesaslongastheavailableservermemoryis largerthanthesizeof thebenchmarkdatabase.

H Break-Even point

Theaverageservicetime of a datastoreis a functionof themix of readandwrite actionsthatconstituteits workload.

Both theuseof caches(KVS of SectionF) andanimplementationof aggregatesasattributes(Manualof SectionE.1)

slow down write actionsin order to speedupreadactions. This is beneficialas long as both the speedupand the

frequency of readactionsis highenoughto compensatefor theslow down observedwith write actions.Otherwise,the

migrationof work maydegrade(insteadof enhance)overall systemperformance.An obviousquestionis how does

thefactorof slow down interactwith thefactorof speedup?We answerthis questionby characterizingwhatfactorof

slow down eclipsestheobservedspeedup.This is thebreakevenpoint. A techniquethatslowsdown write actionsby

a higherfactorthanthis breakevenpoint is undesirablebecauseit doesnot enhancesystemperformance.Derivation

of thebreakevenpoint is asfollows.

Considerthe averageservicetime of a lightly loadedsystemthat doesnot incur queuingdelays. Let � and Sdenotethefrequency of write actionsandtheir averageservicetime, respectively. Theprobabilityof a readactionis

15

Page 16: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

T Frequency of write actions.UAvg servicetime of readactionswith theBasicdesign.VAvg servicetime of write actionswith theBasicdesign.WAvg servicetime of a workload.XFactorof speedupin

Uwith a changeof datadesign.Y

Factorof slowdown inV

with a changeof datadesign.

Table4: Parametersandtheir definitions

Z N\[]��^ . Assuming_ denotestheaverageservicetimeof readactions,theaverageservicetimeof thesystem,̀ , is:

`Iab�dc Sbe Z Nf[]��^ _ (1)

Thevaluesfor S and _ mayvary with differentphysicaldatadesignchangessuchasthoseoutlinedin SectionsE.1

andF.

AssumeEquation1 denotesaverageservicetimewithoutaphysicaldatadesignchange.This is theBasic database

designof Figure1. A physicaldatadesignchange(sayaggregatesasattributesof SectionE.1) enhancesthe value

of _ andincreasesS . Let g ( h ) denotethe factorof speedup(slowdown) in the averageread(write) servicetime

observedwith thisphysicaldatadesignchange.Thenew averageservicetime is:

`]ai�dcjhkc Sbe Z N\[l��^ _g (2)

Onemaysolve for valuesof h thatcauseEquations1 and2 to break-even:

hma gn[oNg p

Z N\[l��^� p

_S eiN (3)

For example,if the averageserviceof readactionswith theBasicdesignis 150msec( _ =150msec)anda physical

datadesigntechnique,sayuseof memcached,enhancesthis time 8.6 folds ( g =8.6) thenthe sametechniquemay

slowdown write actionsby afactorof 23,381andprovidethesameservicetimeastheBasicdatabasedesignwhenthe

frequency of write actionsis 0.4%( � =0.004)andtheaverageservicetime of anupdateis 1.4msec( S =1.4msec).

A valueof h higher(lower) thanthatcomputedusingEquation3 meansthenew databasedesignis slower(faster)

thantheBasicdesign.In ourexample,valuesof h greaterthan23,381imply thattheBasicdesignis fasterthanusing

memcached.

Equation3 quantifiesseveraltrivial andintuitiveobservations:

1. A proposedphysicaldatadesigntechniqueoutperformstheBasicwhenwrite actionsarerare,i.e.,smallvalues

of � .

2. A proposedphysicaldatadesigntechniquemayincurahigherslowdown in servicetimeof write actions(higher

valuesof h ) andcontinueto outperformtheBasicdesignaslongasit enhancestheservicetimeof readactions

16

Page 17: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

by a wider margin, (highervaluesof g ). However, this observationhaslimits: A linear increasein valueof

g doesnot compensatefor a linearincreasein valueof h . In our example,speedingreadactionsup (valueof

g ) from 8.6 to 100folds enablesa modestincreasein slowdown of write actions(valueof h ) from 23,144to

26,185folds. Note thatmorethana 10 fold increasein g did not provide a 10 fold increasein h . And, this

observationsbecomesexaggeratedwith largervaluesof g . In our example,increasingg from a 100to a 1000

folds facilitatesanegligible increaseof h from 26,185to 26,423folds. Equation3 shows this phenomenawith

thecomponentq;r�sq thatapproachesthevalue1 (i.e.,becomesirrelevant)with largevaluesof g .

3. The tolerableslowdown in write actions(valueof h ) is almostan inverselinear function of the frequency of

write actionsexhibitedby a workload(valueof � ). Thus,givena workload,if its frequency of write actionsis

halvedthentheproposedphysicaldatadesignmaytoleratealmosttwice asmuchslowdown in averageservice

time of its write actionsandoutperformthe Basicdesign. With Equation3, this is capturedwith s�r�tt whenu Lo�5v u=w O .

4. The averageservicetime of readactions( _ ) andwrite actions( S ) that constitutethe workload impactsthe

tolerableslowdown in write actions( h ) linearly. If readactionsaremorecomplex thanwrite actionswith a

significantlyhigheraverageservicetime thena proposedphysicaldatadesignmayslow down write actionsby

higherfactorsandcontinueto outperformtheBasicdesign.This is capturedby x y in Equation3.

Thefirst observation is a propertyof socialnetworking applications.While observation2 reasonsabouta proposed

physicaldatadesignchange,Observation 3 providesintuition aboutdifferentworkloadswith varying frequency of

write actions( � ). Observation4 focuseson theaverageservicetime of write andreadsactionswith theRDBMS as

they impacttheoverallaverageservicetimeof boththeBasicdesignandtheproposedphysicaldatadesignchange.

Thesimpleanalyticalmodelof this sectionenablesa databasedesignerto reasonabouta proposedchangein the

physicalorganizationof datato estimatewhetherit providesanenhancementfor theoverall system.

I Conclusion and Future Research

This experimentalpapercomparesorganizationof a socialnetworking databasewith two alternative datastorear-

chitectures:an industrialstrengthSQL solutionanda NoSQL documentstorenamedMongoDB.We usedthe BG

benchmarkfor thiscomparison.TheobservedSoARratingsareimpactedby two key parametersof BG. First, themix

of actionsthat constitutethe workload. Second,configurationof memberprofileswith eithertwo imagesor no im-

ages.We analyzedalternativeenhancementsto boththerelationalrepresentationof SQL-X andJSONrepresentation

of MongoDB.A summaryof theseenhancementsareasfollows,startingwith SQL-X:

z Separatependingandconfirmedfriendshipsinto two tables:Thisdesignmodificationprovidesa33%improve-

mentin performancewith a mixedworkloadthatconsistsof a high (10%)fractionof write actions.

17

Page 18: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

Basic Basic Boosted Boosted memcached EhcacheSQL-X MongoDB SQL-X MongoDB

No Image 0.1%Write 12,322 6,512 33,694 14,665 55,634 271,76010%Write 13,976 5,895 28,503 13,117 49,006 286,260

12 KB Profile 0.1%Write 305 0 11,820 7,700 11,888 147,845Image 10%Write 300 0 10,977 7,438 10,271 144,672

Table5: SoARof alternativedesigns,D =10K, E = F =100.

z Storeprofile imagesthat cannotbe inlined in the file system:This improvesSoAR of SQL-X 40 folds. Note

that thumbnailimagesmustbeinlined andstoredwith therecordrepresentinga member. Otherwise,SoARof

SQL-X dropsto zero.

z Representsimpleanalyticsusingaggregatefunctionsasattribute valuesandmaintainthemup to dateusing

additionalsoftware(eitherat theapplicationlevel or astriggers).

A negative finding is thatmaterializedviews slow down theresponsetime of write actionssodramaticallythat they

canno longerbeconsideredinteractive.

With MongoDB,a key finding is to storethethumbnailimageof a memberasanarrayof bytesin themember’s

JSONobject. This resultsin a SoARof seventhousandactionspersecond.If thumbnailimagesarestoredin either

MongoDB’sGridFSor thefile systemof theoperatingsystem,SoARof MongoDBdiminishesto zero.

With bothMongoDBandSQL-X, onemayextendthedatastorewith a cacheto look up queryresultsinsteadof

processingqueriesto computeresults.We investigatedboth memcachedandEhcache.While bothenhancesystem

performance,theimprovementis dramaticwith Ehcachebecauseit eliminatestheoverheadof transmittingkey-values

acrossthenetwork anddeserializingthem.

WhencomparingthebestSQL-X andMongoDBphysicaldatadesigns,SoARof SQL-X is 2.5 timeshigherthan

MongoDBwhenBG’s databaseis configuredwith no images.With bothsystems,theCPUof theserver hostingthe

datastorebecomes100%utilized. This suggestsSQL-X processesBG’s workloadmoreefficiently thanMongoDB.

WhenBG’s databaseis configuredwith 12 KB images,thenetwork (2 Gbps)becomes100%utilized with bothdata

stores.In thisscenario,SoARof SQL-X is 30%higherthanMongoDB.ThissuggestsSQL-X transmitslessdatathan

MongoDBwhenprocessingBG’s workload.

A key featureof MongoDB,memcached,andEhcacheis their ability to horizontallyscaleto a large numberof

nodes.VoltDB is anSQL solutionthatalsoscalesto a largenumberof nodes[24, 29]. An on-goingresearcheffort

is to quantifythescalabilityof thesesystemsusingBG. This will explorea hostof new physicaldatadesignssuchas

differentpartitioningstrategies,replication,andsecondaryindexes[29]. It will includeaninteractionof thesedesign

choiceswith theboosteddesignspresentedin this study.

18

Page 19: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

J Acknowledgments

We thankMark Callaghanandtheanonymousreviewersof CIKM 2013for their insightsandvaluablecomments.

References

[1] C.Amza,A. L. Cox,andW. Zwaenepoel.A ComparativeEvaluationof TransparentScalingTechniquesfor DynamicContent

Servers. In ICDE, 2005.

[2] C. Amza,G. Soundararajan,andE. Cecchet.TransparentCachingwith StrongConsistency in DynamicContentWebSites.

In Supercomputing, ICS ’05, pages264–273,New York, NY, USA, 2005.ACM.

[3] T. Armstrong,V. Ponnekanti,D. Borthakur, andM. Callaghan.LinkBench:A DatabaseBenchmarkBasedon theFacebook

SocialGraph.ACM SIGMOD, June2013.

[4] L. Backstrom.Anatomyof Facebook,http://www.facebook.com/note.php?noteid=10150388519243859,2011.

[5] S.BarahmandandS.Ghandeharizadeh.BG: A Benchmarkto EvaluateInteractiveSocialNetworkingActions.CIDR, January

2013.

[6] S. BarahmandandS. Ghandeharizadeh.D-Zipfian: A DecentralizedImplementationof Zipfian. ACM SIGMOD DBTest

Workshop, June2013.

[7] S. BarahmandandS. Ghandeharizadeh.ExpeditedRatingof DataStoresUsing Agile DataLoadingTechniques.CIKM,

2013.

[8] D. Beaver, S. Kumar, H. Li, J. Sobel,andP. Vajgel. Finding a Needlein Haystack:Facebook’s PhotoStorage. In OSDI.

USENIX, October2010.

[9] JBossCache.JBossCache,http://www.jboss.org/jbosscache.

[10] R. Cattell. ScalableSQL andNoSQLDataStores.SIGMOD Rec., 39:12–27,May 2011.

[11] J. Challenger, P. Dantzig,andA. Iyengar. A ScalableSystemfor ConsistentlyCachingDynamicWeb Data. In the 18th

Annual Joint Conference of the IEEE Computer and Communications Societies, 1999.

[12] E. F. Codd.A RelationalModel of Datafor LargeSharedDataBanks.Communications of the ACM, 13(6),June1970.

[13] B. F. Cooper, A. Silberstein,E. Tam,R. Ramakrishnan,andR. Sears.BenchmarkingCloudServingSystemswith YCSB. In

Cloud Computing, 2010.

[14] D. Crockford. The Application/JSON Media Type for JavaScript Object Notation (JSON). InternetEngineeringTaskForce

(IETF), RFC4627,July2006.

[15] C. Curino,E. Jones,Y. Zhang,andS.Madden.Schism:A Workload-DrivenApproachto DatabaseReplicationandPartition-

ing. VLDB, 3(1-2),September2010.

[16] A. Datta,K. Dutta,H. Thomas,D. VanderMeer, D. VanderMeer, K. Ramamritham,andD. Fishman.A Comparative Studyof

Alternative Middle Tier CachingSolutionsto SupportDynamicWebContentAcceleration.In VLDB, pages667–670,2001.

19

Page 20: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

[17] L. Degenaro,A. Iyengar, I. Lipkind, andI. Rouvellou. A MiddlewareSystemWhich Intelligently CachesQueryResults.In

IFIP/ACM International Conference on Distributed systems platforms, 2000.

[18] A. Floratou,N. Teletria,D. J.DeWitt, J.M. Patel,andD. Zhang.CantheElephantsHandletheNoSQLOnslaught?In VLDB,

2012.

[19] S. GhandeharizadehandD. DeWitt. Hybrid-RangePartitioningStrategy: A New DeclusteringStrategy for Multiprocessor

DatabaseMachines.VLDB, 1990.

[20] S. GhandeharizadehandJ. Yap. Gumball: A RaceConditionPreventionTechniquefor CacheAugmentedSQL Database

ManagementSystems.In Second ACM SIGMOD Workshop on Databases and Social Networks, Scottsdale,Arizona,2012.

[21] S.GhandeharizadehandJ.Yap. CacheAugmentedDatabaseManagementSystems.In ACM SIGMOD DBSocial Workshop,

June2013.

[22] P. Gupta,N. Zeldovich, andS.Madden.A Trigger-BasedMiddlewareCachefor ORMs. In Middleware, 2011.

[23] A. IyengarandJ.Challenger. Improving WebServerPerformanceby CachingDynamicData.In Proceedings of the USENIX

Symposium on Internet Technologies and Systems, pages49–60,1997.

[24] R. Kallman,H. Kimura,J.Natkins,A. Pavlo, A. Rasin,S.Zdonik,E. Jones,S.Madden,M. Stonebraker, Y. Zhang,J.Hugg,

andD. Abadi. H-Store:aHigh-Performance,DistributedMain MemoryTransactionProcessingSystem.VLDB, 1(2),2008.

[25] A. LabrinidisandN. Roussopoulos.Exploring the Tradeoff BetweenPerformanceandDataFreshnessin Database-Driven

WebServers.The VLDB Journal, 2004.

[26] memcached.Memcached,http://www.memcached.org/.

[27] MongoDB. ClassWriteConcern,http://api.mongodb.org/java/2.10.1/com/mongodb/WriteConcern.html.

[28] R. Nishtala,H. Fugal,S.Grimm,M. Kwiatkowski, H. Lee,H. C. Li, R. McElroy, M. Paleczny, D. Peek,P. Saab,D. Stafford,

T. Tung,andV. Venkataramani.ScalingMemcacheatFacebook.In Tenth USENIX Symposium on Networked Systems Design

and Implementation, April 2013.

[29] A. Pavlo, C.Curino,andS.Zdonik.Skew-AwareAutomaticDatabasePartitioningin Shared-Nothing,ParallelOLTPSystems.

In SIGMOD, 2012.

[30] D. R. K. Ports,A. T. Clements,I. Zhang,S.Madden,andB. Liskov. TransactionalConsistency andAutomaticManagement

in anApplicationDataCache.In OSDI. USENIX, October2010.

[31] R.Sears,C.V. Ingen,andJ.Gray. To BLOB or Not To BLOB: LargeObjectStoragein aDatabaseor aFilesystem.Technical

ReportMSR-TR-2006-45,MicrosoftResearch,2006.

[32] M. Stonebraker. WhatDoes‘Big Data’Mean?Communications of the ACM, BLOG@ACM, September2012.

[33] M. Stonebraker andR. Cattell. 10 Rulesfor ScalablePerformancein SimpleOperationDatastores.Communications of the

ACM, 54,June2011.

[34] Terracotta.Ehcache,http://ehcache.org/documentation/overview.html.

[35] J. Ugander, B. Karrer, L. Backstrom,andC. Marlow. TheAnatomyof theFacebookSocialGraph. CoRR, abs/1111.4503,

2011.

20

Page 21: A Comparison of Two Physical Data Designs for Interactive Social Networking Actions

[36] K. Yagoub,D. Florescu,V. Issarny, andP. Valduriez. CachingStrategies for Data-Intensive Web Sites. In VLDB, pages

188–199,2000.

21


Recommended