Date post: | 07-Jul-2018 |
Category: |
Documents |
Upload: | truongthuan |
View: | 214 times |
Download: | 0 times |
An Evaluationof AlternativePhysicalGraphDataDesignsfor
ProcessingInteractiveSocialNetworkingActions�
ShahramGhandeharizadeh,ReihaneBoghrati andSumitaBarahmand
DatabaseLaboratoryTechnicalReport2014-10
ComputerScienceDepartment,USC
Los Angeles,California90089-0781
�shahram,boghrati,barahman� @usc.edu
September7, 2014
Abstract
Thisstudyquantifiesthetradeoff associatedwith alternativephysicalrepresentationsof asocialgraphfor process-
ing interactivesocialnetworkingactions.Weconductthisevaluationusinga graphdatastorenamedNeo4jdeployed
in a client-server (REST)architectureusingtheBG benchmark.In additionto theaverageresponsetimeof a design,
we quantify its SoARdefinedasthehighestobservedthroughputgiven thefollowing servicelevel agreement:95%
of actionsto observe a responsetime of 100 millisecondsor faster. For an actionsuchascomputingthe shortest
distancebetweentwo members,we observe a tradeoff betweenspeedandaccuracy of thecomputedresult.With this
action,a relationaldatadesignprovidesa significantlyfasterresponsetime thana graphdesign.Thegraphdesigns
provide a higherSoAR thana relationalonewhenthesocialgraphincludeslargememberprofile imagesstoredin
thedatastore.
A Intr oduction
A graphdatabaseprovidesanintuitiverepresentationof asocialgraph.It supportsverticesthatmayrepresentmembers
andedgesthatmayrepresenta relationshipsuchasfriendshipbetweentwo members.Queriesmayfilter verticesof
interestandnavigateedgesto retrieve relevantdata. Updatesmay insertanddeletea vertex, addandremove edges
betweenvertices,andchangethepropertyvalueof edgesandvertices.Facebook’sTAO [1] is anexamplegraphdata
storethatservesa socialgraphto hundredsof millions of userson adaily basis.�In Proceedingsof theSixthTPCTechnologyConferenceonPerformanceEvaluationandBenchmarking(TPCTC),Hangzhou,China,Septem-
ber2014.
1
Onemayrepresentasocialgraphusingdifferentphysicalgraphrepresentations.To illustrate,considerthefriend-
ship relationshipbetweentwo membersA andB. It maystartwith onemember, sayMemberA, extendinga friend
invitation to MemberB. And, MemberB acceptingthis invitation. Two physicalrepresentations,termedLabeledand
Distinct, areasfollows. With Labeled,thefriendshipedgebetweenMemberA andB is assignedavalueto identify it
asa friendshipinvitation. OnceMemberB acceptsA’s invitation,thevalueof thisedgechangesto denoteaconfirmed
friendship.With Distinct,therearetwo typesof edges,onefor apendingfriendinvitationandasecondfor aconfirmed
friendship.WhenMemberB acceptsA’s invitation, thesystemdeletestheedgecorrespondingto thefriend invitation
andcreatesaconfirmedfriendshipedgebetweenthem.Thisdesigncreatesanddeletesedgesmorefrequentlythanthe
Labeleddesign.
A researchtopic is whatarethe tradeoff associatedwith thesealternative designsfor differentworkloads?And,
how do they comparewith datastoresthatimplementa differentdatamodelsuchasrelationaldatabasemanagement
systems(RDBMSs)? To investigatetheseresearchtopics,we hada choiceof benchmarksincluding BG [7, 6, 14],
LinkBench[4], LDBC [11, 2], or amicro-benchmarksuchas[3] and[16]. After acarefulanalysis,wedecidedto use
BG for two reasons.First, BG is a statefulbenchmarkthatquantifiesboth the averageresponsetime of a datastore
andits throughputgiven a pre-specifiedservicelevel agreement(SLA). The latter is termedSocialAction Rating,
SoAR[7], andis similar to thetpsrating1 definedby theTPC-Cbenchmark[12, 15]. As reportedin SectionC.3,an
RDBMS may provide an averageresponsetime that is fasterthanNeo4j for someactionswhile Neo4joutperforms
the RDBMS when consideringSoAR with certaindatabasesettings. Second,BG quantifiesthe amountof stale,
inconsistent,or invalid data(collectively, termedunpredictabledata [7, 8]) producedby a datastore. This is useful
becausecertainsocialnetworking actionssuchascomputingtheshortestdistancebetweentwo membersmayutilize
heuristicsearchtechniquesthatdonot producecorrectresults,seediscussionsof Figure4 in SectionC.
The primary contribution of this study are two folds. First, it identifiesfour physicalgraphdatadesignsfor
processinginteractivesocialnetworkingactions,seeFigure3. Second,it evaluatesthesedesignsusingtheNeo4j[22]
datastoreandthe BG benchmark.This includesextensionsof BG with the following threegraphorientedactions:
GetShortestDistance,List CommonFriends,andList Friends-of-Friends.Themainfindingsof our evaluationareas
follows. TheDistinctphysicalgraphdesignprovidesasuperiorperformancewhencomparedwith theLabeleddesign.
With the threenew graphorientedactions,an industrialstrengthrelationaldatabasemanagementsystem(SQL-X)
providesfasterresponsetimesthanNeo4jconfiguredwith a variantof theDistinct designnamedStoredDistinct(see
descriptionof Figure3 for details).Onereasonfor this is thenormalizationguidelineof therelationaldatamodelthat
representsamany-to-many friendshiprelationshipasatable.Thisenablesthegraphorientedactionsto fetchasmaller
amountof datafrom a singletableto provide fasterresponsetimes. With a workloadconsistingof a mix of actions,
SQL-X providesa higherSoARthanNeo4jwhenthesocialgraphconsistsof no images.Whenlargeprofile images
arestoredin SQL-X, Neo4jprovidesa higherSoARthanSQL-X.1SoARis differentthantps in that theSLA canbechangeddependingon therequirementsof anapplicationwhile TPC-C’s specifiedSLA is
fixed.
2
Therestof this paperis organizedasfollows. We survey the relatedwork in SectionB. SectionC describesan
implementationof theBG benchmarkusingNeo4j,detailingfour physicalgraphdatadesignsandtheir performance
characteristicsfor differentmix of actions.SectionC.3quantifiesthetradeoffs associatedwith agraphandarelational
datadesign.Our futureresearchdirectionsarecontainedin SectionD.
B RelatedWork
Evaluationof graphdatastoreshasbeena subjectof active researchduringthepastfew years.Theaverageresponse
time of differentactionsof a microbenchmarkis presentedin [3] to comparetwo graphdatabases(Neo4j andDex)
with a RDF store(RDF-3X) andtwo relationaldatabasemanagementsystems(PostgreSQLandVirtuoso).Similarly,
in [16], the responsetime of several social networking actionsis usedto comparethe performanceof alternative
graphquerylanguagesusingNeo4jwith JavaPersistentAPI (JPA) usingtheMySQL relationaldatabasemanagement
system.BothstudiesconsiderNeo4jdeployedin eitherembeddedor aclient-server (REST)mode.
Thisstudyis differentthan[3, 16] alongtwo dimensions.First,wefocusonNeo4jCypherRESTto investigatethe
alternativephysicaldesignsof a socialgraph,seethetaxonomyof Figure2 andits discussionin SectionC.1. Second,
weusetheBG benchmarkto analyzeboththeaverageresponsetimeandSoARof thedifferentdesigns.Thisanalysis
includesboth readandwrite actions.(Both [3, 16] focuson readactionsonly.) A key findining is thata designthat
providesa high performancewith infrequentwrite actionsmaynot performwell whenthefrequency of write actions
is higher, seeTable4 andits discussionin SectionC.2. A novel featureof BG is its ability to quantifytheamountof
erroneousdataproducedby a datastore. We usethis capabilityof BG to show that onemay tradeperformancefor
accurracy of resultswith an actionsuchasGet ShortestDistance.To the bestof our knowledge,thesefindingsare
novel andhavenot beenpresentedelsewhere.
C BG benchmark and its implementation using Neo4j
Figure1 showstheconceptualdesignof BG’ssocialgraphusedfor thisevaluation.(See[6, 7, 9] for acomprehensive
descriptionof BG.) The Membersentity set containsthoseuserswith a registeredprofile. It consistsof a unique
identifier anda fixed numberof string attributes2. Onemay configureBG to createa socialgraphwith or without
images.In this paper, weconsiderbothpossibilities.With images,all experimentalresultsareobtainedusingasocial
graphconfiguredwith a 2 KB thumbnailimageanda 12 KB profile image. Thumbnailimagesaredisplayedwhen
listing friendsof a memberandthe higherresolutionprofile imageis displayedwhena membervisits a profile. A
membermayextendafriend invitationto anothermemberor befriendswith amember, representedusing“Invite” and
“Friend” relationshipsets,respectively. A resourcemaypertainto animage,apostedquestion,atechnicalmanuscript,
etc.Theseentitiesarecapturedin onesetnamed“Resources”.In orderfor a resourceto exist, a membermust“Own”2Thesizeof theseattributesis configurable[6].
3
Figure1: BG benchmark’sconceptualschema.
thatresource.A membermaypostaresource,sayanimage,ontheprofileof anothermember, representedasa“Posted
on” relationshipbetweentwo membersanda resource.A membermaycommenton a resource.This is implemented
usingthe“Manipulation” relationshipset.
BG usesa closedemulationmodelto generatea workloadof actionsfor a datastore. With this model,a thread
emulatesa MemberA who performsan actionon anothermemberor resource.This memberwho is performing
the actionis termeda socialite. A threaddoesnot emulateanothersocialiteuntil the pendingactionof the current
socialiteis processed.BG controlstheloadimposedon adatastoreby varyingthenumberof threadsusedto emulate
concurrentsocialitesperformingactions,see[7, 6] for details.
Figure2 shows four differentgraphrepresentationsof this conceptualdatamodel.We describethesealternatives
whenpresentingthedifferentactionsthatconstitutethecoreof BG’s workload.This discussionpresentstheaverage
responsetime ( ��� ) andSocial Action Rating (SoAR) of the alternative graphmodelsusing a single nodeNeo4j
deployment. ��� is quantifiedwith BG emulatinga singlesocialiteissuinga mix of actionsby issuingoneactionat
a time. It is theaverageamountof time elapsedfrom whena socialiteissuesa requestto the time Neo4jcompletes
servicingthe request.SoAR is the highestthroughputobservedwith a servicelevel agreement(SLA) that requires
95%of actionsto observea responsetimeof 100millisecondsor fasterwith no staledata.
The targethardwareplatformconsistsof two PCsconnectedusinga Gigabit switch. EachPC consistsof an i7-
4770processor, 16 GB of memory, oneTB of disk storage,anda Gigabit networking card. The operatingsystem
of eachPC is a 64 bit Windows 2012Server. The versionof Neo4j server is 2.0.1andwe usedNeo4j’s Cypher3
querylanguageto implementthe Client that performsthe interactive socialnetworking actions(termedBGClient).
All experimentsassumea social graphconsistingof 100,000memberswith 100 friendsper member( � ) and100
resourcespermember( ).We classifyBG’s actionsinto readandwrite. Below, wepresentthemin turn.
3Cypheris adeclarative languagesimilar to SQL.
4
Figure3: Four physicalgraphdesigns.
C.1 BG’s ReadActions
BG’s actionsandtheir graphimplementationareasfollows. First, theView Profile (VP) actionemulatesa Socialite
with memberid A visiting theprofileof amemberwith id �� . BG generatesA and �� asinput to VP. A mayequal�� ,emulatinga socialitereferencingherown profile. Theoutputof VP is theprofile informationof �� , including �� ’sattributesandthefollowing two simpleanalytics: � ’snumberof friendsandnumberof postedresourcesonherwall.
If thesocialiteis referencingherown profile (A equals � ) thenVP retrievesa third simpleanalytic, � ’s numberof
pendingfriend invitations.
Theobservedsystemperformancewith theVP actiondependsonthephysicalrepresentationof thegraphdatabase.
Figure 3 shows four differentphysicalrepresentationsusinga two dimensionalquad,seealso Figure 2. The two
dimensionscorrespondto thealternative representationsof thesimpleanalyticsandfriendship.Onemayimplement
thesimpleanalyticsusingaCypherquerythatcomputestherequiredvalueeverytime,seethefirst columnof Table2.
Alternatively, onemay storethe valueof thesesimpleanalyticsandupdatethemin the presenceof write actions,
enablingtheVP actionto simply look up thestoredvalue,seethelastcolumnof Table2. Thesetwo alternativesare
termed4 ComputeandStored, respectively.
With the friendshiprelationship,onemay representpendingfriend invitationsandthe confirmedfriendshipsas
uniqueedges(relationships)independentof oneanother. This designis termedDistinct friendship.Alternatively, one
may representbothasoneedgeandlabel the edgeto identify eithera pendinginvitation or a confirmedfriendship.
This designis termedLabeledfriendship. Thesetwo alternativesconstitutethe rows of Figure3, resultingin four
physicalgraphdesignsshown in thequad.
Thefirst row of Table1 shows theaverageresponsetime, ��� , observedwith the alternative designsfor the VP
action. TheStoredDistinctis clearly thefastestof thealternatives. Its SoARwith VP is morethantwice higherthan4They aretermedBasicandManualin [10] with a relationalandJSONrepresentationof BG socialgraph.
6
ComputeLabeled ComputeDistinct StoredLabeled StoredDistinctView Profile(VP) 308 93 12 8List Friend(LF) 435 293 520 313
Table1: ��� , in milliseconds,for thealternativephysicalgraphrepresentationsusingNeo4jwith a100Ksocialgraph,� =100friendspermember, =100resourcespermember.
Compute,seethefirst row of Table4.
TheList Friend (LF) actionof BG emulatesa socialiteA viewing member � ’s list of friends.Similar to thedis-
cussionof VP, A mayequalto � emulatingthesocialiteviewing herown list of friends.LF retrievestheprofile infor-
mationof eachfriend includingtheir thumbnailimageandexcludingtheir profile image.We implementLF usingthe
following Cypherquery: MATCH (u1:Members)-[f:Friend]- (u2:Members) WHERE u1.userid
= � AND f.status=Confirmed RETURN u2.userid, u2.username, u2.fname, u2.lname,...,
u2.thumbnail.
Table1 showsrepresentationof afriendshipasadistinctedgeis fasterthanusinglabelededges.With thelatter, the
querymustincur theadditionaloverheadof examiningthevalueof eachlabel(pendingversusconfirmedfriendship)
to processtheLF action.However, thealternativedesignsprovidecomparableSoAR,seethesecondrow of Table4.
TheGet ShortestDistance(GSD)actionof BG computesthedistancebetweentwo membersin thesocialgraph.
If thesetwo membersare the sameuser then their shortestpath is zero. If they are friends then their shortest
path is one. If they belongto two disjoint social graphsthen their shortestpath is MAX-INT . The Cypherquery
to implementGSD is: MATCH p=shortestPath((u:Members)-[:Friend*.. depthToTraverse]
-(u2:Members)) WHERE u.userid= �� and u2.userid= �� RETURN length(p) as total. The
parameterdepthToTraverse definesthenumberof levels(termeddepth)of friendshiprelationshiptraversedby
the shortestPath function of Neo4j, striking a balancebetweenthe observedresponsetimesandthe accuracy of the
computedvalue. Increasingdepthmay enhancethe accuracy of GSD andslow down its processing,resultingin a
higherresponsetime.
Figure4.ashow theaverageresponsetimeof GSDasafunctionof thedepthtraversedwith 10and100friendsper
member. BG quantifiesthepercentageof GSDactionsthatobserveincorrectresults,termedunpredictabledata[7], � .Figure4.bshows thepercentageof GSDrequeststhatobserveaccurateresults,termedAccuracy (100-� ), asfunction
of the depthwith differentnumberof friends per member. As we increasethe traverseddepthon the x-axis, the
computeddistancebecomesmoreaccurate(i.e., � decreases[7]) andthesystembecomesslower astheshortestPath
function visits many morevertices. A sufficiently high depthvaluecausesthe shortestPath to visit all verticesand
terminate,producing100%accurateresults.Theresponsetime level off beyondthis depth.
More formally, theresponsetime levelsoff whenthedepthtraversedmultiplied by thenumberof friendsequals
the total numberof members,resultingin 100%accurateresults. For example,in Figure4.a,with the 100K social
graphand100 friendsper member, the responsetime levelsoff at a deptof 1,000. It levelsof at a depthof 10,000
7
DataModel Query
ComputeLabeled
a.MATCH (u:‘Members’)-[f:‘Friend’]-(uu:‘Members’)WHERE u.userid=profileOwnerID AND f.status=ConfirmedRETURN COUNT (uu) AS total
b. MATCH (u:‘Members’)<-[f:‘Friend’]-(uu:‘Members’)WHERE u.userid=profileOwnerID AND f.status=PendingRETURN COUNT (uu) AS total
c. MATCH (u:‘Members’)<-[c:‘Postedon’]- (r:‘Resources’)WHERE u.userid=profileOwnerID RETURN COUNT(r) AS total
d. MATCH (u:‘Members’) WHERE u.userid = profileOwnerIDRETURN u.userid, u.username, u.lname, u.fname,u.gender, u.dob, u.jdate, u.ldate, u.address,u.email, u.tel, u.pic
ComputeDistinct
a.MATCH (u:‘Members’)-[f:‘Friend’]-(uu:‘Members’)WHERE u.userid=profileOwnerIDRETURN COUNT (uu) AS total
b. MATCH (u:‘Members’)<-[f:‘Invite’]-(uu:‘Members’)WHERE u.userid= profileOwnerIDRETURN COUNT (uu) AS total
c. MATCH (u:‘Members’)<-[c:‘Postedon’]-(r:‘Resources’)WHERE u.userid= profileOwnerIDRETURN COUNT(r) AS total
d. MATCH (u:‘Members’) WHERE u.userid = profileOwnerIDRETURN u.userid, u.username, u.lname, u.address, u.gender,u.dob, u.jdate, u.ldate, u.fname, u.email, u.tel, u.picMATCH (u:‘Members’) WHERE u.userid = profileOwnerID
StoredLabeled/ RETURN u.userid, u.username, u.lname, u.fname, u.gender,StoredDistinct u.dob, u.jdate, u.ldate, u.address, u.email, u.tel,
u.friendsCount, u.pendingfCount, u.resourcesCount, u.pic
Table2: CypherqueriesthatimplementtheView Profileactionwith four differentdatamodels.
8
Figure4a: AverageResponseTime( ��� ).
Figure4b: Accuracy.
Figure4: Averageresponsetimeandaccuracy of GSDasafunctionof thetraverseddepthwith a100Kmembersocialgraphandtwo differentsettingsfor thenumberof friendspermember( � = 10 and100).
with 10 friendspermember. Thefirst row of Table3 shows theobservedresponsetime with a depthof 20,000with
differentnumberof friendspermember, � . With thisdepth,GSDprovides100%accurateresulsandits responsetime
levelsoff with all three � values.
With a fixeddepthfor theshortestPathfunction,theresponsetime is fasterwith fewer friendspermemberasthis
functionvisits fewer vertices.Hence,its accuracy is alsolower. To illustrate,considera depthof 100on thex-axis
of Figure4. Theobservedresponsetime with 10 friendspermemberis six time faster, 100versus600milliseconds.
Moreover, theaccuracy is significantlylower, 7% versus25%,asits traversalof eachdepthvisits fewer vertices(10
timeslower) andits likelihoodof visiting thevertex of interestis lower. Thefirst row of Table3 shows theresponse
time increasesasa functionof � asGSDmustprocessmany moreedges.
TheView Friend Request(VFR) actionof BG retrievesSocialiteA’spendingfriendrequest,retrieving theprofile
informationof eachmemberwho hasgenerateda friend requestfor memberA. Thebehavior of VFR with Neo4j is
similar to thediscussionof LF.
A socialiteusesthe View Commentson Resource (VCR) actionto display the attributesof commentsposted
ona resourcewith auniqueRID. Its Cypherqueryis asfollows: MATCH (u:Members)-[m:Manipulation]-
>(r:Resources) WHERE r.rid=RID RETURN u.userid, r.rid, m.mid, m.type, m.content,
m.timestamp. Thesocialitemaypostanddeletecommentson a resource(PCRandDCR) thatcreatesanddeletes
edgesbetweena memberanda resourcevertex, respectively.
The View Top-K Resources(VTR) enablesa socialite(MemberA) to retrieve anddisplayher top � resources
postedon herwall. Both thevalueof � andthedefinitionof “top” areconfigurable.Our Cypherimplementationuses
9
�=10
�=100
�=1,000
GetShortestDistance(GSD) 402 2,733 41,027List CommonFriends(LCF) 2,120 4,368 34,630List Friends-of-Friends(LFF) 12 212 7,939
Table3: ��� , in milliseconds,of theStoredDistinctphysicalgraphdesignasa functionof thenumberof friends( � )with a 100Ksocialgraphand =100resourcespermember.
the uniqueid assignedto a resource(rid) asthe definition of top: MATCH (u:Members)<-[cf:PostedOn]-
(r:Resources) WHERE u.userid=A ORDER BY r.rid LIMIT k.
The List Common Friends (LCF) actioncomputesthe commonfriendsof two members.If thesetwo mem-
bersarethe samememberthen their commonfriends is an emptyset. If they arefriendsthenLCF retrievestheir
commonfriendsexcluding themselves. Otherwise,if their distanceis threeor higher, then the result is an empty
set. The setis definedasthe memberswho area distanceof onefrom both members.Match (u1:Members),
(u2:Members),(mf:Members) WHERE u1.userid= �� AND u2.userid= �� AND (u1)-[:Friend]
-(mf)-[:Friend]-(u2) RETURN mf.userid. Theresponsetimeof LCF increasesasafunctionof thenum-
berof friendspermember, � . (Seethe secondrow of Table3.) At times,the resultof the LCF actionmight be the
emptysetasits input membersmayhaveno commonfriends.Thelikelihoodof this is lowerwith highervaluesof � ,
explainingthehigheraverageresponsetime.
The List Friends-of-Friends (LFF) actioncomputesthosememberswho area distanceof two from the spec-
ified member, including their commonfriends. The Cypherquery to implementthis action is as follows: MATCH
(u1:Members)- [:Friend *2..2]-(u2:Members) WHERE u1.userid= � and NOT (u1)-[:Friend]-
(u2) RETURN distinct u2.userid. The third row of Table3 shows the responsetime of the LFF action
increasessuperlinearlyasa functionof � . With LFF, a tenfold increasein thevalueof � resultsin a tenfold increase
in the numberof retrieveduserids.More precisely, givenM members,BG constructsthe socialgraphby assigning
members(i+j)%M asfriendsof Member� wherethevalueof j variesfrom 1 to5 � � . Hence,LFF retrieves ��� userids.
For example,with � =10 and100, LFF retrieves20 and200 members,respectively. While this explainsthe higher
responsetime asa function of � , thereappearsto be additionaloverheadthat causesthe responsetime of Neo4j to
increasesuperlinearly.
C.2 BG’s Write Actions
BG supportsfour write actionsthat impact the friendshiprelationship(edges)betweenmembers(vertices). These
are Invite Friend (IF), Accept Friend Request(AFR), Reject Friend Request(RFR), and Thaw Friendship(TF).
All involve SocialiteA invoking the actionon Member � . Theseactionsmodify either the presenceof edgesor
the attribute valueof an edgebetweenvertices. For example,the Cyphercreate edge commandfor the IF ac-5Thetoruscharacteristicsof themodfunctionguarantees� friendspermember.
10
Workload ComputeLabeled ComputeDistinct StoredLabeled StoredDistinctView Profile(VP) 971 714 2,205 2,251List Friend(LF) 93 119 112 118
0.1%Write Actions 117 459 819 8351%Write Actions 46 369 435 49910%Write Actions 32 162 0 100
Table4: SoARof thefour physicalgraphmodelswith workloadsconsistingof VP only, LF only, anda mix of readandwrite actions.
tion with thelabeleddesignis asfollows: MATCH (u1:Members), (u2:Members) WHERE u1.userid=A
AND u2.userid= � CREATE (u1)-(:Friend � status:pending � )->(u2).
With theStoredrepresentations,thesewrite actionsmustmaintainthesimpleanalyticsattributevaluesof avertex
(member)upto date.For example,theAFR actionmustincrementthenumberof friendsof theverticescorresponding
to MembersA and �� . Moreover, it mustdecrement6 thenumberof pendingfriend invitationsfor MemberA.
Table4 shows the SoAR of the alternative physicalgraphdesignsfor a mix of readandwrite actions.The first
columnincreasesthe frequency of the write actionssuchasInvite FriendandThaw Friendship,seeTable5. This
reducesthe SoAR of all designsshown in Figure 2. With a mix consistingof 10% write actions,computingthe
analyticsof the View Profile actionprovidesa higherperformancethanthe storeddesignsdueto their overheadto
maintainthevalueof simpleanalyticsup to date.
Representingpendingandconfirmedfriendshiprelationshipswith uniqueedgesprovidesa higherperformance
whencomparedwith labelededges,compareComputeDistinctandComputeLabeledcolumnsin Table4. Both slow
down asa functionof anincreasingmix of write actions.With ComputeDistinct,whena memberconfirmsa pending
friendship invitation, the systemdeletesan edgeand insertsa new one. With ComputeLabeled,the sameaction
changesthevalueassociatedwith apropertyof anedge.Thisconsumesmoreof systemresourceswith ourworkloads,
resultingin a lowerSoAR.
C.3 Comparison with SQL-X
This sectioncomparesthe performanceof an industrialstrengthrelationaldatabasemanagementsystem(RDBMS)
named7 SQL-X with Neo4jusingBG. Theschemausedfor theRDBMS to representthesocialgraphis asfollows:
� Users(userid, username,pw, fname,lname,gender, dob, jdate,ldate,address,email, tel, profileImage,thumb-
nailImage,#Friends,#FriendInvitations,#Resources)
� Friendship(inviter, invitee, status)
� Resource(rid,creatorid,walluserid, type,body, doc)6BG is astatefulbenchmarkthatgeneratesvalid actionsonly. Whenit invokestheAFR actioninvolving MemberA and ��� , it doessobasedon
its knowledgeof � � having apendingfriend invitation from A. See[7] for details.7Dueto licensingagreement,wecannotdisclosetheidentity of this system.
11
BG VeryLow Low HighSocial Type (0.1%) (1%) (10%)Actions Write Write Write
View Profile,VP Read 40% 40% 35%List Friends,LF Read 5% 5% 5%View FriendRequests,VFR Read 5% 5% 5%Invite Friend,IF Write 0.04% 0.4% 4%AcceptFriendRequest,AFR Write 0.02% 0.2% 2%RejectFriendRequest,RFR Write 0.02% 0.2% 2%Thaw Friendship,TF Write 0.02% 0.2% 2%View Top-K Resources,VTR Read 40% 40% 35%View Commentsona Resource,VCR Read 9.9% 9% 1%
Table5: Threemixesof socialnetworkingactions.
Algorithm GSD(USERID1, USERID2, MAXDEPTH):If UserID1 equals UserID2 return 0If MaxDepth equals 0 return MAX-INTInitialize Visited ���Initialize SRC � UserID1 �Initialize CurrentDepth 0While (true):(1) CurrentDepth CurrentDepth+1(2) If (CurrentDepth ! MaxDepth) return MAX-INT(3) If (Visited contains all members) return MAX-INT(4) Qry "SELECT unique inviteeid FROM Friendship WHERE "(5) For each userid in SRC:
Extend Qry with the clause "inviterid=userid"using boolean or connective
(6) Visited Visited " SRC(7) Execute Qry using RDBMS to obtain a result set �(8) If (UserID2 #$� ) return CurrentDepth(9) SRC = � - ( �&% Visited)(10)If (SRC is empty) return MAX-INT
Figure5: GetShortestDistanceusingSQL-X.
� Manipulation(mid,rid, modifierid,creatorid, timestamp,type,content)
Underlinedattributesareindexedandserve astheprimarykey of a table. An italicizedattributerepresentsa foreign
key relationship.A confirmedfriendshipbetweentwo membersis representedastwo rows.
Exceptfor the LCF andthe GSD actions,an implementationof BG’s actionsusingthe SQL query languageis
straightforward anddescribedin [7, 10, 6]. We implementLCF(A,B) usinga singlequery: SELECT DISTINCT
f1.inviteeid FROM Friendship f1, f2 WHERE f1.inviteeid=f2.inviteeid and
f1.inviterid=A and f2.inviterid=B and f1.status=Confirmed and f2.status=Confirmed.
Figure5showsanimplementationof theBreadthFirstSearch(BFS)algorithmto implementGSDusingtheSQLquery
language.ThisalgorithmissuesaSQLqueryfor eachlevel of BFSstartingwith onememberof thesocialgraph,iden-
tified by UserID1. It terminatesonceit encounterstheothermemberof thesocialgraph(UserID2),exhaustsall the
12
SQL-X Neo4jGetShortestDistance(GSD) 718 2,588List CommonFriends(LCF) 14 4,317List Friends-of-Friends(LFF) 26 163
Table6: ��� in millisecondswith maximumdepth=1,000.
With Images No ImagesSQL-X Neo4j SQL-X Neo4j
0.1%Write Action 360 835 20,550 1,4601%Write Action 290 499 16,135 68810%Write Action 0 100 2,095 150
Table7: SoARwith 100Kmembers,� =100fpm, and =100rpm,with andwithout images.
membersof thesocialgraph,or exceedsits maximumalloweddepth.
Table6 shows theaverageresponsetime of GSD,LCF, andLFF with SQL-X andNeo4j for a socialgraphcon-
sistingof 100Kmembers,100fpm, and100rpm. SQL-X is fasterthanNeo4jfor processingeachof thesecommands.
An SQL implementationof thesecommandsreferencea singletable,Friendship,that is a vertical slice of the data.
For example,TheGSDalgorithmof Figure5 queriestheFriendshiptablerepeatedlyin Step5. Neo4j,on theother
hand,mayretrieve a vertex thatcontainsseveralpropertyvaluesof a memberincludinga 12 KB profile image. It is
possibleto furtherenhancethereportedGSDnumberswith SQL-X by implementingthealgorithmof Figure5 asa
storedprocedure.
Table7 shows theobservedSoARwith SQL-X andNeo4j (usingtheStoredDistinctdesign,seeFigure3) for the
threemix of write actionsshown in Table5. We considera BG databaseconfiguredwith eitherimagesor no images.
The latter lacks the 12 KB profile imageandthe 2 KB thumbnailimage. With both, the schemaof SQL-X stores
the simpleanalyticsof a memberasan attribute valueof a row andrequiresa write actionto maintainthesevalues
up-to-date[10].
SQL-X performspoorly whenrequiredto storeimageslarger than4 KB [19, 10] andNeo4j outperformsit by
a wide margin. With a socialgraphthat hasno images,SQL-X outperformsNeo4j by a wide margin, seelast two
columnsof Table7. SoAR of Neo4j is alsoenhancedwhenthe socialgraphhasno images.In [10], we show that
storingprofile imagesin thefile system,termedBoostedSQL-X design,enhancestheSoARof SQL-X by morethan
tenfolds. A futureresearchdirectionis to analyzeNeo4jwith imagesstoresin thefile system(similar to thediscussion
of BoostedSQL-X). We speculateits performanceto fall betweenthetwo extremesshown in Table7.
D Futur e Research Dir ection
We areextendingthis studyby consideringadditionalgraphdatastores,characterizingtheir scalabilityandtheir role
in processingmorecomplex socialnetworking actions.We describethesein turn.
13
WeareusingBG to completeanevaluationof Neo4jandothergraphdatabasessuchasG* [18] andOrientDB[23].
This includesananalysisof their scalabilitycharacteristicsanda comparisonwith datastoresthatsupportalternative
datamodels,e.g., documentstores,extensiblestores,key-valuestoresand relationalDBMSs. We also intend to
analyzetheoverheadof anObjectGraphModel(OGM) suchasBlueprintwhencomparedto usingthenativeinterface
of a graphdatastore[16].
Moreover, weintendto investigatealternativephysicalgraphdesignsfor processingmorecomplex socialnetwork-
ing actions,namely, feedfollowing actionssuchasShareResource(SR)andView New Feed(VNF) [9]. Thesemodel
a memberproducingeventsfor consumptionby othersanddisplayingtheeventsgeneratedby othermembersanden-
tities, typically their friendsor thosethatthey follow. Both thehighly variablefan-outof thefollowsgraphalongwith
its dynamicallychangingstructure(e.g.,a memberthaws friendshipwith anothermember)makesanimplementation
of feedfollowing challenging[20, 9]. Onemayintroducedifferentdesignsandimplementationsto addressthesechal-
lenges[21, 17, 5]. Oneis to materializethefeedof amemberandmaintainit up to datewhennew eventsareproduced
by thoseshefollows[21]. A graphdatabasesuchasNeo4jmaybesuitablefor thisPushparadigmbecauseit supports
extensionsof a vertex with new attributes.A designmaysplit a vertex into multiple verticesonceit increasesbeyond
a certainsize[13]. Finally, edgesmay maintainthe relationshipbetweenolder andnewer feedasa member’s feed
grows in size.An alternative to Pushis to Pull eventsandmayincludeclever designsthatsynergizesthosemembers
with mutualfriendsby maintainingonenews feedfor them.We planto investigatethesealternative implementations
with Neo4jandothergraphdatastores.
References
[1] Z. Amsden,N. Bronson,G. CabreraIII, P. Chakka,P. Dimov, H. Ding, J.Ferris,A. Giardullo,J.Hoon,S.Kulka-
rni, N. Lawrence,M. Marchukov, D. Petrov, L. Puzar, andV. Venkataramani.TAO: How FacebookServesthe
SocialGraph.In SIGMODConference, 2012.
[2] R. Angles, P. Boncz, J. Larriba-Pey, I. Fundulaki,T. Neumann,O. Erling, P. Neubauer, N. Martinez-Bazan,
V. Kostev, andI. Toma.TheLinkedDataBenchmarkCouncil:A GraphandRDFIndustryBenchmarkingEffort.
SIGMODRec., 43:27–31,March2014.
[3] R. Angles,A. Prat-Perez,D. Dominguez-Sal,andJ. Larriba-Pey. BenchmarkingDatabaseSystemsfor Social
Network Applications. In First InternationalWorkshopon GraphData ManagementExperiencesandSystems,
GRADES’13, 2013.
[4] T. Armstrong,V. Ponnekanti,D. Borthakur, andM. Callaghan.LinkBench: A DatabaseBenchmarkBasedon
theFacebookSocialGraph.ACM SIGMOD, June2013.
[5] X. Bai, F. P. Junqueira,andA. Silberstein.CacheRefreshingfor OnlineSocialNewsFeeds.In CIKM, 2013.
14
[6] S. Barahmand.BenchmarkingInteractive SocialNetworking Actions,Ph.D.thesis,ComputerScienceDepart-
ment,USC,2014.
[7] S. BarahmandandS. Ghandeharizadeh.BG: A Benchmarkto EvaluateInteractive SocialNetworking Actions.
Proceedingsof 2013CIDR, January2013.
[8] S. BarahmandandS. Ghandeharizadeh.BenchmarkingCorrectnessof Operationsin Big DataApplications.
Proceedingsof IEEE MASCOTS, 2014.
[9] S. BarahmandandS. Ghandeharizadeh.Extensionsof BG for TestingandBenchmarkingAlternative Imple-
mentationsof FeedFollowing. ACM SIGMODWorkshopon ReliableData ServicesandSystems(RDSS), June
2014.
[10] S. Barahmand,S. Ghandeharizadeh,andJ. Yap. A Comparisonof Two PhysicalDataDesignsfor Interactive
SocialNetworking Actions. In CIKM, 2013.
[11] P. Boncz.LDBC: Benchmarkfor GraphandRDFDataManagement.IDEAS, October2013.
[12] TransactionProcessingPerformanceCouncil.TPCBenchmarks,http://www.tpc.org/information/benchmarks.asp.
[13] R. Nishtalaet.al. ScalingMemcacheat Facebook.NSDI, 2013.
[14] S.GhandeharizadehandS.Barahmand.A Mid-Flight Synopsisof theBG SocialNetworkingBenchmark.Fourth
WorkshoponBig Data Benchmarking, October2013.
[15] J. Gray. The BenchmarkHandbookfor DatabaseandTransactionSystems(2nd Edition), MorganKaufmann
1993,ISBN 1055860-292-5.
[16] F. HolzschuherandR. Peinl. Performanceof GraphQueryLanguages:Comparisonof Cypher, Gremlin and
NativeAccessin Neo4J.In Proceedingsof theJoint EDBT/ICDT2013Workshops, EDBT ’13, 2013.
[17] F. P. Junqueira,V. Leroy, M. Serafini,andA. Silberstein.ShepherdingSocialFeedGenerationwith Sheep.In
SNS, 2012.
[18] A. Labouseur, P. Olsen,andJ. Hwang. ScalableandRobustManagementof DynamicGraphData. In VLDB
WorkshoponBig DynamicDistributedData, 2013.
[19] R. Sears,C. V. Ingen, and J. Gray. To BLOB or Not To BLOB: Large Object Storagein a Databaseor a
Filesystem.TechnicalReportMSR-TR-2006-45,MicrosoftResearch,2006.
[20] A. Silberstein,A. Machanavajjhala,andR. Ramakrishnan.FeedFollowing: TheBig DataChallengein Social
Applications.In DBSocial, 2011.
15