+ All Categories
Home > Documents > Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data...

Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data...

Date post: 20-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
12
Muppet: MapReduce-Style Processing of Fast Data Wang Lam 1 , Lu Liu 1 , STS Prasad 1 , Anand Rajaraman 1 , Zoheb Vacheri 1 , AnHai Doan 1,2 {wlam,luliu,stsprasad,zoheb,anhai}@walmartlabs.com and [email protected] 1 @WalmartLabs, 2 University of Wisconsin-Madison ABSTRACT MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Ex- amples of such data include sensor data streams, the Twit- ter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write such applica- tions and execute them over a cluster of machines, to achieve low latency and high scalability? In this paper we report on our investigation of this ques- tion, as carried out at Kosmix and WalmartLabs. We de- scribe MapUpdate, a framework like MapReduce, but specif- ically developed for fast data. We describe Muppet, our im- plementation of MapUpdate. Throughout the description we highlight the key challenges, argue why MapReduce is not well suited to address them, and briefly describe our current solutions. Finally, we describe our experience and lessons learned with Muppet, which has been used exten- sively at Kosmix and WalmartLabs to power a broad range of applications in social media and e-commerce. 1. INTRODUCTION MapReduce [8] has emerged as a popular paradigm to pro- cess big data. Using MapReduce, a developer simply writes a map function and a reduce function. The system automat- ically distributes the workload over a cluster of commodity machines, monitors the execution, and handles failures. In the past few years, however, not just big data, but fast data, i.e., high-speed real-time and near-real-time data streams, has also exploded in volume and availability. Prime examples include sensor data streams, real-time stock mar- ket data, and social-media feeds such as Twitter, Facebook, YouTube, Foursquare, and Flickr. The emergence of social media in particular has greatly fueled the growth of fast data, with well over 4000 tweets per second (400 million tweets per day [12]), 3 billion Facebook likes and comments per day [9], and 5 million Foursquare checkins per day [2]. Numerous applications must process fast data, often with minimal latency and high scalability. For example, an ap- plication that monitors the Twitter Firehose for an ongoing earthquake may want to report relevant information within a few seconds of when a tweet appears, and must handle drastic spikes in the tweet volumes. As the number and sophistication of such applications grow, a natural question arises: Can we provide a MapReduce-like framework for fast data, so that developers can quickly write and execute such applications on large clusters of machines, to achieve low latency and high scalability? In this paper we describe our investigation of this ques- tion, as carried out at Kosmix, a San-Francisco-Bay-Area startup, and at WalmartLabs, an advanced development lab newly established by Walmart (Walmart acquired Kosmix in May 2011 to form the seed of WalmartLabs). In Section 2 we describe a number of motivating applications that pro- cess fast data, and argue why MapReduce is not well suited for such applications. In Section 3 we describe MapUpdate, a framework to pro- cess fast data. Like MapReduce, in MapUpdate the devel- oper only has to write a few functions, specifically map and update ones. The system automatically executes these func- tions over a cluster of machines. MapUpdate, however, dif- fers from MapReduce in several important aspects. First, MapUpdate operates on data streams, so map and update functions must be defined with respect to streams. For example, mappers and updaters map streams to streams, split streams, or merge streams. Second, streams may never end, so updaters use storage called slates to summarize the data they have seen so far. The notion of slates does not arise in MapReduce, nor in many recently proposed stream- processing systems (see Section 6). In MapUpdate, slates are in effect the “memories” of updaters, distributed across multiple map/update machines as well as persisted in a key- value store for later processing. Making such pieces of mem- ory explicit and managing them as “first-class citizens,” in a near-real-time fashion, is a key distinguishing aspect of the MapUpdate framework. Finally, a MapUpdate application often involves not just a mapper followed by an updater, but many of them in an elaborate workflow that consumes and generates data streams. In Section 4 we describe Muppet, a MapUpdate imple- mentation developed at Kosmix and WalmartLabs. We dis- cuss the key challenges of Muppet in terms of distributed ex- ecution, managing slates, handling failures, reading slates, and sketch our solutions. Since mid-2010, we have used Muppet extensively to develop many social media and e- 1814 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present their results at The 38th International Conference on Very Large Data Bases, August 27th - 31st 2012, Istanbul, Turkey. Proceedings of the VLDB Endowment, Vol. 5, No. 12 Copyright 2012 VLDB Endowment 2150-8097/12/08... $ 10.00. arXiv:1208.4175v1 [cs.DB] 21 Aug 2012
Transcript
Page 1: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

Muppet: MapReduce-Style Processing of Fast Data

Wang Lam1, Lu Liu1, STS Prasad1, Anand Rajaraman1, Zoheb Vacheri1, AnHai Doan1,2

{wlam,luliu,stsprasad,zoheb,anhai}@walmartlabs.com and [email protected]@WalmartLabs, 2University of Wisconsin-Madison

ABSTRACTMapReduce has emerged as a popular method to process bigdata. In the past few years, however, not just big data, butfast data has also exploded in volume and availability. Ex-amples of such data include sensor data streams, the Twit-ter Firehose, and Facebook updates. Numerous applicationsmust process fast data. Can we provide a MapReduce-styleframework so that developers can quickly write such applica-tions and execute them over a cluster of machines, to achievelow latency and high scalability?

In this paper we report on our investigation of this ques-tion, as carried out at Kosmix and WalmartLabs. We de-scribe MapUpdate, a framework like MapReduce, but specif-ically developed for fast data. We describe Muppet, our im-plementation of MapUpdate. Throughout the descriptionwe highlight the key challenges, argue why MapReduce isnot well suited to address them, and briefly describe ourcurrent solutions. Finally, we describe our experience andlessons learned with Muppet, which has been used exten-sively at Kosmix and WalmartLabs to power a broad rangeof applications in social media and e-commerce.

1. INTRODUCTIONMapReduce [8] has emerged as a popular paradigm to pro-

cess big data. Using MapReduce, a developer simply writesa map function and a reduce function. The system automat-ically distributes the workload over a cluster of commoditymachines, monitors the execution, and handles failures.

In the past few years, however, not just big data, butfast data, i.e., high-speed real-time and near-real-time datastreams, has also exploded in volume and availability. Primeexamples include sensor data streams, real-time stock mar-ket data, and social-media feeds such as Twitter, Facebook,YouTube, Foursquare, and Flickr. The emergence of socialmedia in particular has greatly fueled the growth of fastdata, with well over 4000 tweets per second (400 milliontweets per day [12]), 3 billion Facebook likes and commentsper day [9], and 5 million Foursquare checkins per day [2].

Numerous applications must process fast data, often withminimal latency and high scalability. For example, an ap-plication that monitors the Twitter Firehose for an ongoingearthquake may want to report relevant information withina few seconds of when a tweet appears, and must handledrastic spikes in the tweet volumes. As the number andsophistication of such applications grow, a natural questionarises: Can we provide a MapReduce-like framework for fastdata, so that developers can quickly write and execute suchapplications on large clusters of machines, to achieve lowlatency and high scalability?In this paper we describe our investigation of this ques-

tion, as carried out at Kosmix, a San-Francisco-Bay-Areastartup, and at WalmartLabs, an advanced development labnewly established by Walmart (Walmart acquired Kosmix inMay 2011 to form the seed of WalmartLabs). In Section 2we describe a number of motivating applications that pro-cess fast data, and argue why MapReduce is not well suitedfor such applications.In Section 3 we describe MapUpdate, a framework to pro-

cess fast data. Like MapReduce, in MapUpdate the devel-oper only has to write a few functions, specifically map andupdate ones. The system automatically executes these func-tions over a cluster of machines. MapUpdate, however, dif-fers from MapReduce in several important aspects. First,MapUpdate operates on data streams, so map and updatefunctions must be defined with respect to streams. Forexample, mappers and updaters map streams to streams,split streams, or merge streams. Second, streams may neverend, so updaters use storage called slates to summarize thedata they have seen so far. The notion of slates does notarise in MapReduce, nor in many recently proposed stream-processing systems (see Section 6). In MapUpdate, slatesare in effect the “memories” of updaters, distributed acrossmultiple map/update machines as well as persisted in a key-value store for later processing. Making such pieces of mem-ory explicit and managing them as “first-class citizens,” in anear-real-time fashion, is a key distinguishing aspect of theMapUpdate framework. Finally, a MapUpdate applicationoften involves not just a mapper followed by an updater, butmany of them in an elaborate workflow that consumes andgenerates data streams.In Section 4 we describe Muppet, a MapUpdate imple-

mentation developed at Kosmix and WalmartLabs. We dis-cuss the key challenges of Muppet in terms of distributed ex-ecution, managing slates, handling failures, reading slates,and sketch our solutions. Since mid-2010, we have usedMuppet extensively to develop many social media and e-

1814

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. Articles from this volume were invited to presenttheir results at The 38th International Conference on Very Large Data Bases,August 27th - 31st 2012, Istanbul, Turkey.Proceedings of the VLDB Endowment, Vol. 5, No. 12Copyright 2012 VLDB Endowment 2150-8097/12/08... $ 10.00.

arX

iv:1

208.

4175

v1 [

cs.D

B]

21

Aug

201

2

Page 2: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

commerce applications (over streams such as Twitter andFoursquare). We describe this experience, lessons learned,as well as current and future extensions. We discuss relatedwork in Section 6 and conclude in Section 7.

2. MOTIVATING APPLICATIONSWe describe several motivating applications, argue why

MapReduce is not well suited for these applications, thenoutline our desiderata for the MapUpdate framework.

Example 1. Consider an application that monitors theFoursquare-checkin stream to count the number of check-ins by retailer (e.g., JCPenney, Best Buy, and Walmart).For each incoming checkin, the application analyzes the textof the checkin (typically represented as a JSON object) toidentify the retailer (if any), then increases the appropriatecount. The counts are maintained continuously and dis-played “live” on a Web page. ✷

Example 2. The second application monitors the Twit-ter Firehose to detect hot topics as they occur. For ease ofexposition, we will use the following simple heuristic: Theapplication first classifies each incoming tweet into a smallset of pre-defined topics. Next, as a pre-specified time inter-val (for example, a minute) passes, the application countsthe number of tweets per topic. If this number divided bythe average number of tweets that mention the same topic inthe corresponding time interval each day (this average num-ber is maintained by the application across multiple days)exceeds a pre-specified threshold, then the application emitsthe topic and the minute. Thus, the output is a stream of<topic, minute> pairs that reports which topic is hot foreach minute. ✷

Example 3. The third application maintains a reputa-tion score for each Twitter user as users tweet. It analyzeseach incoming tweet to determine if the tweet affects thescore of any users, then changes those scores. The score ofa user can be affected by many factors. For example, if auser A retweets or replies to a user B, then the score of Bmay change, depending on the score of A. The output is areal-time data structure (e.g., a hash table) of <user, score>pairs. ✷

Other applications include maintaining the top-ten URLsbeing passed around on Twitter, and maintaining live coun-ters of the number of HTTP requests made to various partsof a Web site.

The key commonality underlying all of these applicationsis that they perform stream computations, which consumestreams and produce streams or continuously-updated datastructures as the output. We argue that MapReduce andvariations of it are not well suited to such computations, forthe following reasons. First, MapReduce runs on a staticsnapshot of a data set, while stream computations proceedover an evolving data stream. In MapReduce, the inputdata set does not (and cannot) change between the startof the computation and its finish, and no reducer’s inputis ready to run until all mappers have finished. In streamcomputations, the data is changing all the time; there is nosuch thing as working with a “snapshot” of a stream.

Second, every MapReduce computation has a “start” anda “finish.” Stream computations never end. The data streamgoes on forever. Typical stream computations update some

data structure based on the input stream, and either outputa stream or answer queries on the data structure they main-tain (e.g., how many items have we seen so far that satisfycertain conditions?). In the MapReduce model, the reducestep needs to see a key and all the values associated withthe key; this is impossible in a streaming model.Finally, in case of failure, it is always possible (even if

inconvenient) to restart a MapReduce computation fromscratch. This possibility may not exist for many streamcomputations; streams continue to flow at their own rate,oblivious to processing issues. The system should be ableto cope with failures very quickly to avoid falling too farbehind the stream.Thus, we need a new framework for stream computations.

We would like this framework to satisfy the following re-quirements:

• The framework should be easy to program. It shouldhave a simple model that enable the rapid develop-ment of many applications. Ideally, it should retainthe familiar Map and Reduce feel, to help developersquickly write stream applications.

• The framework should manage dynamic data struc-tures as first-class citizens. Many developers are accus-tomed to reasoning with such data structures explicitlyin their code, and many stream applications need toproduce such structures for higher-level applications.

• The framework should deliver low-latency processing.Applications should stay near real-time with their in-put streams, and computed data (i.e., dynamic datastructures) should be available for live querying.

• The framework should scale up on commodity hard-ware with computation and stream rate.

In the next section we describe such a framework, MapUp-date.

3. THE MAPUPDATE FRAMEWORKWe assume a hardware platform similar to MapReduce,

i.e., a cluster of commodity machines. In practice, the ma-chines need to be more memory-heavy and less disk-heavythan in a MapReduce cluster. The reason is that moststream computations read streams as they flow by and main-tain in-memory data structures, in contrast to MapReducecomputations that read and produce large files.

Events and Streams: Example events are tweets, Face-book updates, and Foursquare checkins. Formally, an eventis a tuple 〈sid, ts, k, v〉, where

• sid is the ID of the stream that the event belongs to.

• ts is a timestamp. To ensure a well-defined outputwhen merging multiple streams, we assume timestampsare global across all streams (local timestamps, if any,can be stored in the value v).

• k is a key. Keys have atomic values and need not beunique across events. For example, the key for a tweetmight be the user ID. Key are used to group events,similar to the way they are used in MapReduce. Weassume a global key space across all streams, thoughour model can be easily extended to handle multiplekey spaces (e.g., one per stream).

1815

Page 3: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

Figure 1: Example MapUpdate applications

• Finally, v is a value, which can be any “blob” asso-ciated with the event. For example, if the event is atweet with the key being the user ID, then the valuecan be the entire JSON object representing the tweet.

A stream is then a sequence of all events with the same sid,in the increasing order of timestamp ts (using a determinis-tic tie-breaking procedure). Streams can be external (e.g.,the Twitter Firehose and the Foursquare-checkin stream) orinternal, being generated by map and update functions asdescribed below.

Map Functions: A map function map(event) → event∗subscribes to one or more streams. All events from thesestreams will be fed as input, one by one, in the increasingorder of their timestamps, using a deterministic tie-breakingprocedure, to the map function. For example, suppose amap function M subscribes to two streams S1 and S2, andsuppose that S1 begins with an event e with timestamp 21:23and that S2 begins with an event f with timestamp 21:25.Then event e will be fed to M , followed by event f , followedby whichever event in S1 and S2 that has the next lowesttimestamp, and so on.

Given an event as the input, the map function processesit then emits zero or more events to various streams. Thus,this function is analogous to the map function in MapRe-duce. Each output event has a timestamp greater than thetimestamp of the input event, so that even if an outputevent is emitted to the same stream as the input event, thestream’s events can be processed in timestamp order.

Update Functions and Slates: An update functionupdate(event, slate) → event∗ also subscribes to one ormore streams, and is also fed as input all events from thesestreams, one by one, in the increasing order of their times-tamps, using a deterministic tie-breaking procedure. Whenthe update function U takes as input an event e with key k,it is also given a slate SU,k.

The slate SU,k is an in-memory data structure that storesall important information that the update function U mustkeep about all the events with key k that U has seen sofar. Within the update function U , there is one slate foreach key. For example, if the key space is Twitter user IDs,then there is one slate per user ID for U , which may store

summary information such as the number of tweets by thatuser, the time of the last tweet by the user, and the set ofuser interests that the update function has been able to inferfrom the tweets seen so far.Recall that in MapReduce, the reduce function takes a

key k and the list L of all values associated with key k, then“reduces” L to emit new <key, value> pairs. Update playsan analogous role. However, because an update functionoperates on streams, it cannot take as input the list of allevents with key k: It has not seen all such events, and inany case, the list of events with key k that it has seen so farmay already be too large to keep around.To solve this problem, we introduce the notion of slate:

a data structure that “summarizes” all events with key kthat an update function U has seen so far. When givena new event e with the same key, U uses e to update theslate (hence the name “update” for this function). Thus, theslate is a live data structure that is continuously updated in(near) real time. Each slate also has a time-to-live param-eter, which is set to “forever” by default but can be set toa concrete value after which the slate can be deleted to freeup memory. When an update function U accesses a slateassociated with a key k for the first time (either becausethis is the first time U sees an event with key k, or becausethe slate has been deleted after its time to live has expired),the update function must set up the set of variables it needsin the slate and initialize those variables.It is important to emphasize that each pair <update U ,

key k> uniquely determines a slate, not that each key kuniquely determines one. Indeed, we can have one slateassociated with a key k for an update function U1, and yetanother slate associated with key k for another update func-tion U2.In other words, unlike the “memoryless” map functions,

an update function has a memory. This memory is parti-tioned into pieces called slates, each associated with a par-ticular event key. Each copy of an update function, whenrun on a machine, is in charge of a set of event keys, andhence will directly update the set of slates associated withthose keys (see Section 4.1). The slates are updated andkept in the main memory of those machines, but also per-sisted on disk in a key-value store (see Section 4.2). The

1816

Page 4: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

developer can reason about the slates explicitly, and querythem live.

In addition to updating the input slate, the update func-tion may also emit new events, just like the map function.

MapUpdate Applications: A MapUpdate applicationis a workflow of map and update functions. Map func-tions consume streams and produce new streams, or emitnew events into existing streams. Updater functions con-sume streams, continuously update slates, and produce newstreams or emit new events into existing streams.

Thus, the workflow is modeled as a directed graph (allow-ing cycles), whose nodes represent map and update func-tions, and whose edges represent streams. Figure 1(a) showssuch a graph. The output of the MapUpdate application isa set of streams and slates, as specified by the application.The following two examples illustrate MapUpdate applica-tions:

Example 4. Figure 1(b) shows the workflow of the ap-plication that counts Foursquare checkins per retailer (seeExample 1). This application starts with stream S1, theFoursquare checkin stream. A map function M1 inspectseach checkin to see if the checkin happened at a retailer’slocation. If yes, M1 emits an event with the retailer ID (e.g.,JCPenney, Walmart) to a new stream S2. An update func-tion U1 subscribes to S2 and counts the number of checkinsper retailer. Specifically, for each retailer U1 maintains aslate with a count variable initially set to 0. U1 then incre-ments count by 1 every time it sees an event with the sameretailer ID. The output of the application is the set of slatesmaintained by U1. ✷

Example 5. Figure 1(c) describes the application thatmonitors tweets to detect hot topics (see Example 2). Theapplication starts with S1, the Twitter stream. A map func-tion M1 analyzes each tweet t in S1 to infer a set of topics(taken from a predefined set of possible topics). Let m bethe minute in which the timestamp of tweet t occurs (e.g.,if the timestamp is 00:14 then m = 14; if the timestampis 23:59 then m = 1439). For each inferred topic v, M1

publishes an event with the key v m (i.e., a string that con-catenates v and m) to a new stream S2 indicating that topicv is mentioned in a tweet that occurs in the minute m.

An update function U1 subscribes to stream S2 to countthe frequencies of topics per minute. When U1 first encoun-ters an event with key v m, it creates a slate for this key,and sets count = 0 in the slate. Every time it sees an eventwith the same key, it increments count by 1. After a minute(counting from when it sees the first event with key v m),U1 publishes an event (key = v m, value = count) to a newstream S3, indicating that topic v is mentioned count timesin the minute m.An update U2 monitors stream S3 to find hot topics.

Specifically, when U2 sees an event (v m, count), it com-putes count/avg countv m. If this ratio exceeds a certainthreshold then U2 publishes an event with key v m to a newstream S4, indicating that topic v is hot in the minute m.The quantity avg countv m is the average count of topic vin the minute m. U2 keeps a slate for key v m, with twosummaries:

• total count: the total number of times topic v has beenmentioned in the minute m so far, since the first day,and

Figure 2: An illustration of the working of Muppet

• days: the number of days since when the applicationis deployed.

U2 uses these two quantities to compute avg countv m. Theoutput of the application is stream S4. ✷

We note that update functions maintain no “global” vari-ables (across the slates). For example, update function U2 inExample 5 maintains a “local” variable days in each of theslates, even though all days variables have the same valueacross all slates, and thus technically can be “merged” into asingle “global” variable. Update functions do not maintainglobal variables across the slates to avoid the concurrency-control problem in which multiple copies of the same updatefunctions, run on different machines, all attempt to modifythe same “global” variables at the same time.As described above, it is not difficult to show that if

• the map and update functions are deterministic, inthat the input event completely determines the outputevents and slate updates;

• the events of the subscribed streams are fed into amap/update function in a well-defined order (which inthis case is the increasing order of their timestamps,using a deterministic tie-breaking procedure); and

• the timestamps of output events are greater than thatof the input event (to ensure that loop executions arewell-defined),

then a MapUpdate application is well-defined, in that it gen-erates well-defined streams and sequences of slate updates.Ideally, a MapUpdate implementation should produce theseexact streams and slate updates. Due to practical con-straints, however, it often can only approximate them, butshould try to do so as closely as possible.To write a MapUpdate application, a developer writes the

necessary map and update functions, then a configurationfile that includes the workflow graph.

4. THE MUPPET SYSTEMWe now describe Muppet, our implementation of the Ma-

pUpdate framework. We describe Muppet 1.0 (Sections 4.1–4.4), developed at Kosmix, then Muppet 2.0 (Section 4.5),developed at WalmartLabs, which addresses several key lim-itations of Muppet 1.0.

1817

Page 5: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

4.1 Distributed ExecutionTo execute a MapUpdate application on a cluster of ma-

chines, Muppet starts up a set of programs on each machine.Each program executes a map or update function. The pro-grams are called workers, which can be divided into map-pers and updaters, depending on which function they run.To distribute the computation, each worker will be fed onlyevents of certain key values, as determined by a hash func-tion. Figure 2 illustrates this process. Given an applicationthat runs a map function followed by an update function,suppose Muppet has decided to run five workers: three map-persM1−M3 for the map function, and two updaters U1−U2

for the update function.Muppet begins by using a special mapper M0 to read from

the input stream (see the figure). Given an event e with akey k, M0 hashes k to find out which mapper to send eto. Suppose this mapper is M1. M0 places event e in thequeue for M1 (this queue is maintained in memory for theM1 program). M0 then reads the next event in the inputstream, and so on.The mapper M1 takes the next event from its queue, pro-

cesses the event, and produces a set of events. For eachproduced event f , M1 hashes its key and the destinationupdate function to find out which updater to send the eventto. Suppose this updater is U2. M1 then places event f inthe queue for U2. M1 then processes the next event from itsown queue, and so on.Thus, events flow continuously through the workflow of

mappers and updaters. For each event that a worker pro-duces, it must find out which workers to send the event to,where those workers are (i.e., on which machines), then placethe event into the queues for the workers. One way to makethis determination (and the way Muppet currently employs)is to give all workers the same hash function to map <eventkey, destination map/update function> to workers. Thatway, after producing an event, any worker can instantly cal-culate which worker the event hashes to, then contact thatworker to place the event into the appropriate queue. Eachworker has its own queue for input events.This mode of passing events is in stark contrast to MapRe-

duce. There, after a mapper has produced output files, itcontacts the master, which in turn notifies the appropriatereducers to get the files. This solution is not well suitedfor our setting because it is very important for many Ma-pUpdate applications to minimize latency, i.e., to producestreaming output data quickly, in as “real time” as possible.To achieve this, we try to remove as many intermediariesas possible. Hence, Muppet lets the workers pass eventsdirectly to one another without going through any master.(The master in Muppet is used for handling failures, seeSection 4.3.)Hashing ensures that all events with the same key k will go

to the same updater U (this is similar to MapReduce, whereall events with the same key go to the same reducer). Theupdater uses the events to update a slate SU,k associatedwith key k. Only this updater can update SU,k, so there areno concurrent updates for SU,k.

4.2 Managing SlatesPersisting Slates in a Key-Value Store: As described,an updater U maintains a slate SU,k for each key k. Theseslates are cached in the memory of the machine running U .

Muppet also persists them in a key-value store, for threereasons. First, the slates of U may outgrow the memory, inwhich case some of them have to be spilled to disk. Second,persistent slates help resuming, restarting, or recovering theapplication from crashes. Finally, we often need to querythe slates, which represent the computation of a MapUpdateapplication, long after the termination of the application.Muppet currently uses Cassandra as the key-value store.

A Cassandra cluster consists of a set of machines, each run-ning the Cassandra program, all configured to recognize oneanother as parts of the same cluster. The cluster maintainsa set of key spaces, each of which contains a set of columnfamilies. Each column family, in turn, stores data valuesindexed by <key, column> pairs.A Muppet application’s configuration file identifies a Cas-

sandra cluster (by its machine names and service TCP port),a key space within the cluster, and a column family withinthe key space. Within this column family, Muppet storesslate SU,k (for the update function U and key k) as a valueat row k and column U . Our applications often use JSONto encode slates for language independence and flexibility,so Muppet compresses each slate before storing it in thekey-value store.When the updater U needs the slate with key k, Muppet

first checks the cache (in the memory of the machine runningU). If the slate is not found, Muppet retrieves the slatefrom the Cassandra cluster by reading the value indexed bythe pair <k,U>. The retrieved value is decompressed thenpassed to the updater.If the requested slate does not exist in Cassandra, either

because the updater has never seen an event with this key,or because the slate has been deleted after its time to liveexpired, then Muppet initializes a new slate in the cache,then passes it to the requesting updater.

Using SSDs and Caching Slates: We run our Cas-sandra key-value store on solid-state flash-memory storage(SSDs). This allows us to devote Cassandra’s memory tobuffering writes, while caching reads in the slate cache (i.e.,the memory of machines running updaters). We found thissolution very helpful for several reasons:

• When Muppet starts up, its slate cache is empty, soearly update events may require many row fetches fromthe key-value store. Fast random access helps the storerespond to this volume of reads more quickly, warmingthe slate cache.

• While running, Muppet often needs random-seek I/Ocapacity to fetch uncached slates. Meanwhile, Cas-sandra also requires I/O capacity for periodic com-pactions, thus slowing down Muppet. Using SSDs pro-vides high I/O capacity to help us sustain both needs.

• Because applications often update popular slates re-peatedly, we minimize disk I/O for writing at the key-value store if we devote the store’s main memory tobuffering writes. Overwrites of the same row in thekey-value store are relatively inexpensive if the row isstill in memory at the time of the write, so it is ad-vantageous for us to delay flushing the writes (i.e., thememory table) to disk as long as possible. Further, themore times a row is flushed to disk by the store sinceits last file compaction, the more files will have to bechecked for the row when it needs to be retrieved.

1818

Page 6: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

Flushing, Quorum, and Time-to-Live Parameters:Muppet applications can adjust a set of parameters to reachthe desired level of performance, reliability, and consistency.First, dirty (updated) slates are periodically flushed to thekey-value store. The application can set the flushing inter-val, ranging from “immediate write-through” to “only whenevicted from cache.”

Second, the application can specify the desired quorumused by the Cassandra store for a successful read/write op-eration: any single machine to which the data is assigned forstorage, a majority of replicas where the data is assigned, orall of the replicas where the data is assigned.

Third, key-value stores like Cassandra allow their clientsto specify a time-to-live (TTL) parameter for each write.Correspondingly, each updater function in a Muppet appli-cation can have a TTL value configured for its slates. Slatesthat have not been updated (written) for longer than theTTL value may be garbage-collected by the key-value store,resetting to an empty slate at that time.

The TTL parameter helps contain the amount of storageused by a Muppet application over time. Many such appli-cations only care about current activities in their streams,declining to receive or generate events on obsolete keys. Forexample, an application may want to keep track of only ac-tive Twitter users (e.g., those who have tweeted at leastonce in the past quarter), a working set which is typicallymuch smaller than the set of all Twitter users who have evertweeted.

By making TTL a user-configured parameter, applicationdevelopers can keep slates as long as needed without havingto manually delete slates that are no longer useful. Thissetting is configurable per update function because differentupdate functions often track different kinds of data, thusrequiring different shelf lives.

4.3 Handling FailuresWe now describe how Muppet handles two major types of

failure: machine crash and queue overflow.

Machine Crash: In Muppet each worker keeps trackof all failed machines. Recall that when a worker A needsto pass an event, it determines the worker B to which tosend the event by hashing the key and destination updaterfunction of the event (technically accomplished using a hashring). Worker A reaches worker B to place the event on B’sincoming-event queue.

If A cannot contact B, then it assumes the machine host-ing B has failed, and A contacts the master to report themachine failure. The master broadcasts the machine failureto all workers, which update their lists of failed machinesaccordingly. Since all workers use the same hash ring, fromthen on all events with the same key will be routed to workerC instead of the (now failed) worker B. The event thatfailed to reach B is lost (and logged as lost) rather thansent through the event-dispatch process again.

In Muppet, since events typically flow through the systemat high speed, and since a worker is frequently contacted, inmost cases the above solution allows us to detect workerfailures and recover from them in a timely fashion, and ispreferable to the MapReduce solution of having the masterpinging the workers periodically to detect worker failure.

When an updater fails, whatever changes that it has madeto the slates and that have not yet been flushed to the key-value store are lost. Furthermore, all events in its queue are

also lost. Currently, low latency is far more important formost of our Muppet applications, while failing to processsome tweets, for example, is acceptable. Hence, we do notattempt to recover the lost events in the queue. Instead, wefocus on quickly detecting the failed worker and redirectingevents to another worker, thereby minimizing our latencyand losses. Developing a replay capability to recover thelost events is a subject of future work.

Queue Overflow: When a worker A tries to place anevent into the queue of a worker B, if the queue of B is full(i.e., its size has reached a pre-specified limit), B will declineto accept the event. In this case A has to invoke a queueoverflow mechanism.The queue overflow mechanism can take one of several

actions. First, it can decide to drop the incoming events(until B can accept events again). The dropped eventscan be logged for later processing and debugging. Sec-ond, it can direct the incoming events to a specified “over-flow” stream whose recipients can process such events. Theoverflow stream can be connected to map and update func-tions that implement “slightly degraded” service, for exam-ple by substituting expensive operations in the main work-flow pipeline with approximate operations that are cheaperto execute. Finally, the overflow mechanism can also decideto slow down the pace of passing events among the mappersand updaters (as discussed in more detail in Section 5).

4.4 Reading SlatesAs Muppet runs a MapUpdate application, the applica-

tion maintains live state in its updaters’ slates. This stateis often the value of the application’s computation, and isoften read by higher-level applications. To make this pos-sible, Muppet provides a small HTTP server on each nodefor slate fetches.The URI of a slate fetch includes the name of the up-

dater and the key of the slate to uniquely identify a slate.The fetch retrieves the slate from Muppet’s slate cache (onthe appropriate machine, forwarding the request internallyif necessary) rather than from the durable key-value storeto ensure an up-to-date reply.

4.5 Developing Muppet 2.0So far we have described Muppet 1.0, which was developed

at Kosmix. In Muppet 1.0, each worker was implementedas two tightly coupled processes: a Perl process called aconductor, and a process running the JVM called a taskprocessor here.The conductor is in charge of all “Muppet logistics,” in-

cluding retrieving the next event from its queue of incomingevents; sending the event (together with a slate, if necessary)to the JVM task processor; receiving the output events (anda modified slate if applicable) from the JVM task processor;hashing the output events to their appropriate destinations;enqueueing the events at their destination workers’ queues,and so on. The JVM task processor’s sole task is to runthe map or update code to process the event passed to itby the conductor, then send the output events back to theconductor.As described, Muppet 1.0 suffered from several limita-

tions:

• Recall that a machine typically runs a set of workers.Each worker on the machine must load its own copy

1819

Page 7: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

of the map or update code so that it can run its JVMtask processor. These duplicate copies of code wastememory.

• Passing data between processes (e.g., passing eventsback and forth between the conductor and the taskprocessor) can be computationally wasteful.

• Each worker on a machine maintains its own slate (inthe conductor). Thus, the slate cache on the machineis technically the set of disparate slates maintained bythe workers. Maintaining the slates disparately canresult in a significant waste of memory. For example,suppose we determine that we need to cache a work-ing set of 100 popular slates on a single machine torun update events efficiently. If we run a single up-dater on the machine, we could reasonably assign theupdate function a slate cache of 100 slates to capturethis working set. By contrast, if we run five updaterson the same machine, Muppet divides the slates of theupdate function among them. Because the keys of thepopular slates may be hashed unevenly among them(for example, one of the five updaters might get 25of the popular slates, not 20), we have to configurea larger slate cache per updater (e.g., 25 slates eachand not 20) to cache the same working set (yielding alarger total slate cache of 125 slates instead of 100).

• Finally, it is difficult to fully utilize the number ofcores on the machine, because the number of work-ers per machine is typically set based on the natureof the application, not based on the number of cores.In a machine with numerous CPU cores, it may beimpractical to run as many workers as cores for everymap and reduce function to utilize all cores regardlessof which function has the most events to process atany moment. As the number of cores and the num-ber of map and reduce functions grow, the number ofworkers would grow, amplifying the memory problemsdescribed above. The more numerous processes canalso require more context switching to execute whenevents distribute widely among them.

Muppet 2.0, developed at WalmartLabs mostly in Javaand Scala, addresses these limitations. In Muppet 2.0, westart up many threads of execution in a dedicated threadpool per machine. Each thread in this thread pool is nowa worker, capable of running any map or update function.It is helpful to specify as large a number of threads as theparallelization of the application code allows. For example,the number may be as large as the number of CPU coresavailable on a machine, or smaller if the application’s op-erators depend on a bottleneck resource that has a lowerparallel-scaling limit.

Besides the worker threads, each machine also runs athread to provide background I/O to the durable key-valuestore (so that writes to the store can proceed without block-ing map and update calls), and a thread pool to provideHTTP service for slate reads and basic status information(such as the event count of the largest event queues).

Map and update functions are then written so that theycan be run in multiple threads concurrently. To conservememory, each map and update function is constructed onlyonce and shared by all threads. All slates are now kept

in a single “central” slate cache, not scattered in multipleconductor processes as in Muppet 1.0.To process events, Muppet 2.0 maintains a queue per

worker thread. When an event arrives at the machine, itis hashed by event key and destination updater functioninto a primary event queue and a secondary event queue. Ifthe thread for either queue is already processing this eventkey for this update function, then the event is placed in thecorresponding queue. Otherwise, the event is placed in theprimary queue unless the secondary queue is significantlyshorter, in which case the event is placed in the secondaryqueue instead. Each thread then takes the next event fromits queue; executes the map or update function, dependingon what the event requires; updates the appropriate slates ifnecessary; sends out the output events; takes the next eventfrom its queue; and so on.The dispatch of an incoming event to only one of two

target queues, instead of to potentially any of the queues,brings several benefits. First, an incoming event locks nomore than two queues to be dispatched to one of them re-gardless of the number of threads running map and updateoperations, reducing queue-lock contention when receivingincoming events.Second, events of the same key for the same update func-

tion do not scatter across many threads on the same ma-chine, reducing contention for the same slate among threadswhen those events get executed.Finally, should an incoming event’s primary queue be al-

ready heavily loaded by some other events, the incomingevent can be placed on a secondary queue to better balanceevent load across available cores.Thus, unlike Muppet 1.0, in which only one worker can

process events of the same key for a particular update func-tion, ensuring no slate contention, in Muppet 2.0, two work-ers can vie for the same slate, but this contention is limitedto at most two workers per slate.A fundamental reason why Muppet 2.0 allows slate con-

tention is that if only one worker can process events of thesame key, that worker can become a hotspot: if it is over-loaded by a huge number of events with key k1 already in itsqueue, a long time may pass before the worker gets aroundto processing events with some key k2. Hence, Muppet 2.0allows events with key k2 to be placed into the queue of asecond worker, if the queue of the first worker is already toolong. This helps relieve the hotspot at the first worker, butcan introduce slate contention when both the first workerand the second worker get events with key k2 enqueued. Inpractice we have found that if the contention for any slateis limited to just two workers, it does not cause noticeableproblems for our current applications.As described, Muppet 2.0 addresses the above four limita-

tions of Muppet 1.0. Each worker is now a thread that canexecute any map or update function, not a pair of tightlycoupled processes that can execute a single map or updatefunction. All threads share the same map and update code,thus eliminating the waste of memory to hold redundantcopies of the code. Passing data between processes is elimi-nated within each machine. All slates are now kept in a cen-tral pool, eliminating potentially underutilized slate-cachememory. Finally, the number of worker threads are set tomaximize the potential for parallel execution on multicoremachines.

1820

Page 8: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

5. MUPPET EXPERIENCEAND ONGOING EXTENSIONS

The first version of Muppet went into production at Kos-mix in mid-2010. Since then we have improved Muppetseveral times, as discussed above, and used it extensivelyat Kosmix and later at WalmartLabs. At Kosmix it wasused to process the Twitter Firehose and Foursquare-checkinstream. By early 2011 Muppet processed over 100 millionstweets and 1.5 million checkins per day. It kept over 30 mil-lions slates of user profiles and 4 million slates of venue pro-files. It ran over a cluster of tens of machines, and achieveda latency of under 2 seconds. Muppet was used to powerTweetBeat, the flagship product of Kosmix, and now Shopy-Cat, a popular Facebook product recently released by Wal-martLabs. About 16 developers (about half of the developerworkforce at Kosmix) have used Muppet to quickly writeabout 15 applications, a number of whom have worked withMuppet applications at Kosmix and selected it again for newapplications at WalmartLabs. By June 2012, our Cassandrastore has grown to maintain over 2 billion slates for variousproduction Muppet applications. We now discuss our expe-rience running Muppet and several ongoing extensions.

Limiting Slate Sizes: We observe that slates can growquite large and updaters that maintain large slates can runmore slowly due to the overhead. Consequently, we encour-age developers to keep individual slates small, e.g., manykilobytes rather than many megabytes.

Changing the Number of Machines on the Fly: Mup-pet runs on a cluster of machines. Currently the number ofmachines in the cluster cannot be changed on the fly. To addmore machines, for example, we have to restart the Muppetapplication. While this setting has proven sufficient for ourapplications so far, one can imagine scenarios where it isdesirable to be able to change the number of machines onthe fly.

Hence, we are currently exploring this option. The mainchallenge is how to redistribute the workload. For example,suppose that a machine A is currently overwhelmed withprocessing events with key k. So we want to add a newmachine B to help with this. Should we move some of theevents with key k to machine B? If so, both machines A andB will be processing events with key k. The slate for theseevents would have to be replicated at both A and B; andcoordinating the two slate copies will be highly difficult.

Handling Hotspots: The distribution of event keys canbe strongly skewed (e.g., follow a Zipfian distribution). Con-sequently, updaters can receive widely varying loads, and anupdater that receives an overwhelming load can potentiallybecome a hotspot.

We already discussed one way to handle such hot spots:sending events with the same key to up to two threads in-stead of one (see Section 4.5). This approach allows Muppetto make progress on processing events with the same key ona secondary thread if the first thread is currently boggeddown with other events, and at the same time reduce theworkload of the first thread. This load distribution comesat the cost of some contention between the two threads forownership of the same slate.

Another way to handle hotspots is to exploit the fact thatnumerous update computations are associative and commu-tative, to distribute the workload of an overwhelmed up-

dater among a set of updaters. The following simple exam-ple illustrates this idea:

Example 6. Consider again the application that countsFoursquare checkins per retailer in Example 4. In this ap-plication, a map function examines each checkin to emit thename of a retailer (if any), such as JCPenney, Best Buy,or Walmart. An update function then counts the emittedlocation events per retailer.Let U be the updater that counts Best Buy events. Sup-

pose, hypothetically, that a lot of people are checking intoBest Buy: U can quickly become a hotspot as it becomesoverwhelmed by the number of arriving Best Buy events.To address this problem, observe that counting Best Buyevents is associative and commutative. Hence, instead ofusing just a single updater U , we can use a set of updaters,each of which counts just a subset of Best Buy events. Wecan then sum the counts of these updaters.Specifically, we can modify the map function to replace the

single key “Best Buy” with two keys “Best Buy1” and “BestBuy2,” say. In effect, the map function partitions the set ofevents with key “Best Buy” into two subsets with keys “BestBuy1” and “Best Buy2,” respectively. Next, we modify theupdate function so that it regularly emits the counts of “BestBuy1” events and “Best Buy2” events, respectively, as newevents under the key “Best Buy.” Finally, we write a newupdate function that receives the events of key “Best Buy”to determine the total counts of “Best Buy1” events and“Best Buy2” events. ✷

We have discussed how to redistribute the workload of ahotspot updater among a set of updaters. Yet another wayto handle hotspots is to slow down the pace of events inthe workflow. In our settings, some Muppet applicationsdo not need near-real-time latencies (e.g., in millisecondsor seconds). Examples include applications that do notuse time-related or time-sensitive data (and simply tap theMapUpdate framework as a convenient way to implementa workflow for machine-scalable distributed computation)and applications that run on legacy tweets. In such cases,accepting longer latencies for stable operation is often ac-ceptable. Consequently, when Muppet detects a hotspot, itcan slow down the pace at which it consumes events fromits input streams (e.g., the Twitter Firehose) to allow untilthe hotspot updater has a chance to catch up. We call thisapproach source throttling.It is also possible to throttle the pace of events at any later

point in the workflow, not just at the input streams, but ifnot done very carefully, doing so can quickly introduce dead-locks. To see why, consider an updater U that emits eventsinto its input streams (thus introducing a loop). SupposeU consumes an event e and is about to emit 10,000 eventsback into U ’s input stream, and emitting 10,000 events allat once would overwhelm U . We may be tempted to slowdown the pace at which events are emitted: Muppet couldemit 10,000 events one by one, in an incremental fashion,as soon as U is ready to consume its next event. Unfor-tunately, this approach would introduce a deadlock. Afteremitting the first event of the 10,000 output events, Mup-pet would be waiting for U to finish processing the currentevent (i.e., event e), before emitting the second event of the10,000 output events. However, U cannot finish processingevent e until we have emitted all 10,000 output events.

1821

Page 9: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

Note that the above scenario does not arise in the case ofslowing down the pace of consuming events from the appli-cation’s input streams (e.g., the Twitter Firehose), becausewe assume that no mappers nor updaters can emit eventsinto such streams.

Placing Mappers and Updaters: Currently the place-ment of mappers and updaters in Muppet is in effect decidedby the hashing function that hashes event keys to machinesand workers. We are exploring how to place mappers andupdaters so that they are close to their data in a way thatreduces network traffic.

This problem is nontrivial in part because Muppet maynot know in advance which event streams will have the mostdata. For example, let us revisit the application that countsFoursquare checkins per retailer in Example 4. In it, a mapfunction emits an event to an update function each time acheckin for a recognized retailer arrives. Suppose, for sim-plicity, that checkins arrive at a particular machine m, anda mapper there runs the map function. Which keys (andcorresponding slates) for the update function should go tomachine m, and which ones should be assigned elsewhere? Ifthe most popular retailers’ slates reside on machine m, thenthe smallest number of events from the mapper have to tra-verse the network (and pay the corresponding latency costsor consume the corresponding network utilization) to reachan updater. Unfortunately, such a determination dependson the contents of the checkin events themselves, so Mup-pet cannot determine this assignment in advance. Muppetcannot even know whether perturbations in retailer popu-larity are transient spikes to absorb or changing trends thatrequire a different slate-to-machine assignment. Finally, ap-plications typically have multiple update functions that maybe directly or indirectly connected by event flows, so movingslates to optimize network traffic into one update functionmay affect the network usage of events from it to a subse-quent update function. (Key and slate assignments that re-duce network traffic for the input or output of one functionsmay increase the network traffic coming in or out anotherfunction.)

Bulk Reading of Slates: In many applications at Kosmixand WalmartLabs users want to make periodic dumps ofmany slates. In such cases, repeated HTTP slate fetches canbe expensive (in network round trips) or difficult (becausethe query agent must know all the slate keys in advance toenumerate the slate requests).

To address this problem, we have advised bulk-dump usersto log the relevant slate data that they wish to process inbulk later as a part of the applications’ update functions.This approach allows users to write less than the entireslate to minimize the dumped data, and provides steady-state write behavior that avoids sudden bulk I/O, which canaffect the performance of the machines supporting the appli-cation. These writes can be streamed using a library of theuser’s choice into HDFS, for example, if further processingin Hadoop is desired.

Another approach users can take is to request large-volumerow reads from the durable key-value store itself. Users thatchoose to do so must know how slates are written to the key-value store (an implementation detail of Muppet, describedin Section 4.2) to extract the slates back from the appropri-ate key-value-store queries.

In the future, we would like to revisit the scenario of how

users use MapUpdate-application slates for later Hadoopprocessing so that we can simplify and automate this in-tegrated use case better.

Managing Side Effects: We have found that applicationssometimes wish to act on events in ways outside of updat-ing slates or publishing events. Such actions fall outside thescope of the current MapUpdate framework itself, and wecurrently leave it to the application to carry out such actionsin its own map and update functions. For example, appli-cations may want to log relevant slate data for later bulkprocessing, as discussed above. As another example, de-velopers often instrument map and update functions to logcertain data for later debugging. As yet another example,an application may wish to have a map or update functionmake some HTTP request to a server when a criterion issatisfied, so that the outside server can be notified when thecriterion is satisfied rather than requiring the server to sam-ple or probe slates repeatedly to make the determination.While leaving it to the applications to carry out such side-

effect actions, we do advise developers to be careful of subtleeffects of such actions on the Muppet application. For ex-ample, asking mappers and updaters to write to a commonlog can introduce lock contention for the common logger,thereby dramatically slowing down the workers.

6. RELATED WORKWe have compared our work with MapReduce throughout

this paper.A number of recent works (such as MapReduce Online [7],

Nova [18], work by Li et al. [14], and Incoop [4]) have ex-tended MapReduce to perform incremental batch process-ing. MapReduce Online pipelines data between the mapand reduce operators by calling reduce with partial datafor early results. To retain the MapReduce programmingmodel, it runs reduce periodically (as a minimum interval oftime passes or a batch of new data arrives), retaining someof its blocking behavior. Nova determines and provides thedeltas between increments directly to workflows written forPig, but its authors warn that this approach is more suit-able for large batches than small increments because of theoverhead costs in underlying systems. Systems such as In-coop apply memoization to the results of partial computa-tions so that subsequent computations can efficiently reuseresults for inputs that were unchanged by additional incre-mental data. The prototype one-pass analytics platform de-scribed in [14] optimizes MapReduce jobs by (among otherimprovements) exploiting main memory to pre-combine mapoutputs by key (when the MapReduce job has an optionalcombine function defined); this optimization is most nearlyanalogous to how Muppet exploits main memory to cacheslates (indexed by updater operator and event key), minusany event-serialization considerations for a slate.By contrast, MapUpdate uses slates to summarize past

data, so an updater can immediately process each event(and change the slate) as the event comes in. This approachallows us to stream events through the system with millisec-ond to second latencies.Many streaming query-processing systems, such as con-

tinuous query systems, have been developed in the databasecommunity [10] and in industry (e.g., Aurora [21], commer-cialized as StreamBase Systems, and Borealis [1]; Cloud-Scale [6]; STREAM [3]; SPADE [11] for System S, com-

1822

Page 10: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

mercialized as IBM InfoSphere Streams; and Telegraph [5],commercialized into Truviso). Our work differs from thesesystems in two important aspects. First, these systems of-ten employ declarative query languages over structured datawith known schema. In contrast, we make few assumptionsabout the structure of the data, and adopt a MapReducestyle in which applications are decomposed into a proce-dural workflow of custom code. Second, much work hasfocused on optimizing query processing over data streamsin a relational-database style (e.g., how to factor opera-tions out of multiple queries, and push operators to opti-mal locations for query execution). In contrast, we focuson how to efficiently execute relatively arbitrary Maps andUpdates over a cluster of machines to achieve low latencyand high scalability. Like Flux [19], Muppet strives to dis-tribute the input load of each update operator across mul-tiple machines, but Muppet does not currently change itsload partitioning dynamically except when a machine fails.Unlike Spark Streaming [20], which offers APIs for Scala toenable developers to write programs modeling streams asa sequence of small batch computations, Muppet offers asimple MapReduce-style framework, MapUpdate, to enabledevelopers to write continuously updating streaming appli-cations.

Other avenues of low-latency-application development areavailable, including specialized stream-processing chips suchas GPUs programmed with OpenCL [13] or CUDA [17]),and high-bandwidth remote-memory access (RDMA) overspecialized high-speed interconnects (such as InfiniBand).Unlike computations on GPUs, which often perform wellcomputing similar operations in parallel on a vector of val-ues, MapUpdate is designed to allow arbitrary computationon general-purpose CPU cores for each input event, includ-ing data-dependent recursion or event publication. UnlikeRDMA, which allows an application to span multiple ma-chines using a shared-memory model implemented on high-speed networks, Muppet is designed to run on commodityhardware, allowing us to build large slate caches using theunion of main memory on multiple machines linked by inex-pensive gigabit Ethernet networks. The MapUpdate model,in particular, allows us to explicitly shard application stateacross machines to sidestep an explicit need for fast sharedmemory between them.

Our work is also similar in spirit to recently proposeddistributed stream-processing systems, such as S4 [16] andStorm [15]. These systems, however, leave it to the applica-tion to implement and manage its own state. Our experiencesuggests that this is highly nontrivial in many cases. Bycontrast, Muppet transparently manages application stor-age, which are slates in our case, and makes these slatesaccessible as the continually-updated computed values of astreaming application.

Indeed, the explicit and first-class-citizen management ofapplication memory in the form of slates is a key distinguish-ing aspect of our work, in sharp contrast to current work inincremental MapReduce, RDBMS-style stream processing,and industrial distributed stream processing systems.

7. CONCLUSIONWe have motivated the need for a MapReduce-style frame-

work for processing fast data. We have described such aframework, MapUpdate, and our implementation of Ma-pUpdate at Kosmix and WalmartLabs, Muppet. Through-

out the discussion we have tried to motivate and highlightthe differences between MapReduce and MapUpdate. Thekey differences include the need to redefine applications anduser-defined functions to operate over streams; the need forslates, continuously updated data structures that “summa-rize” the events seen so far; the need for workflows of Mapsand Updates; the importance of minimizing latency and howthat influences design decisions on distributing and passingevents and handling failures; and finally the need to persistslates, as the semantics of the application dictates.We have also reported on our experience using Muppet

at Kosmix and WalmartLabs. Overall, we conclude that aMapReduce-style framework to process fast data, such asimplemented in Muppet, is feasible and highly promising,in terms of allowing developers to quickly write fast-dataapplications, and to achieve low latency and high scalabilityin those applications. Learning from our experience, we arecurrently deploying Muppet to more applications and areextending it in several important directions.

8. REFERENCES[1] D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel,

M. Cherniack, J.-H. Hwang, W. Lindner, A. S.Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing,and S. Zdonik. The Design of the Borealis StreamProcessing Engine. In CIDR, pages 277–289, 2005.

[2] AWS Case Study: foursquare. http://aws.amazon.com/solutions/case-studies/foursquare/.

[3] B. Babcock, S. Babu, M. Datar, R. Motwani, andJ. Widom. Models and Issues in Data StreamSystems. In PODS, pages 1–16, 2002.

[4] P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar,and R. Pasquin. Incoop: MapReduce for IncrementalComputations. In SOCC, pages 7:1–7:14, 2011.

[5] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J.Franklin, J. M. Hellerstein, W. Hong,S. Krishnamurthy, S. Madden, V. Raman, F. Reiss,and M. Shah. TelegraphCQ: Continuous DataflowProcessing for an Uncertain World. In CIDR, pages269–280, 2003.

[6] CloudScale. http://www.cloudscale.com/.

[7] T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein,K. Elmeleegy, and R. Sears. MapReduce Online. InNSDI, pages 313–327, 2010.

[8] J. Dean and S. Ghemawat. MapReduce: SimplifiedData Processing on Large Clusters. In OSDI, pages137–150, 2004.

[9] Facebook, Inc. Amendment No. 8 to Form S-1Registration Statement Under The Securities Act of1933. http://sec.gov/Archives/edgar/data/1326801/000119312512235588/d287954ds1a.htm,2012.

[10] M. Garofalakis, J. Gehrke, and R. Rastogi, editors.Data Stream Management. Springer, 2009.

[11] B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, andM. Doo. SPADE: The System S Declarative StreamProcessing Engine. In SIGMOD, pages 1123–1134,2008.

[12] Going Social. http://www.economist.com/events-conferences/americas/information-2012?

bclid=1682222098001&bctid=1684182003001. An

1823

Page 11: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

interview with Twitter CEO Dick Costolo at IdeasEconomy: Information.

[13] K. Group. OpenCL. http://www.khronos.org/opencl/.

[14] B. Li, E. Mazur, Y. Diao, A. McGregor, and P. J.Shenoy. A Platform for Scalable One-Pass Analyticsusing MapReduce. In SIGMOD, pages 985–996, 2011.

[15] Storm. https://github.com/nathanmarz/storm.

[16] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4:Distributed Stream Computing Platform. In ICDMW,pages 170–177, 2010.

[17] J. Nickolls, I. Buck, M. Garland, and K. Skadron.Scalable Parallel Programming with CUDA. Queue,6(2):40–53, 2008.

[18] C. Olston, G. Chiou, L. Chitnis, F. Liu, Y. Han,M. Larsson, A. Neumann, V. B. N. Rao,V. Sankarasubramanian, S. Seth, C. Tian,T. ZiCornell, and X. Wang. Nova: ContinuousPig/Hadoop Workflows. In SIGMOD, pages1081–1090, 2011.

[19] M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, andM. J. Franklin. Flux: An Adaptive PartitioningOperator for Continuous Query Systems. In ICDE,pages 25–36, 2003.

[20] M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica.Discretized Streams: An Efficient and Fault-TolerantModel for Stream Processing on Large Clusters. InHotCloud, 2012.

[21] S. Zdonik, M. Stonebraker, M. Cherniack,U. Cetintemel, M. Balazinska, and H. Balakrishnan.The Aurora and Medusa Projects. IEEE DataEngineering Bulletin, 26(1):3–10, 2003.

APPENDIXA. EXAMPLE MAP AND UPDATEThe map and update functions are expressed in JVM lan-

guages for Muppet as application-provided class implemen-tations of Java interfaces called Mapper and Updater. Im-plementations of the interfaces are constructed using twoparameters, a configuration object for the application and astring for the name of the map or update function being in-stantiated. (Because the same Mapper or Updater code canbe reused as different map and update functions through-out an application, each map and update function in theapplication is identified by unique name.)Figures 3 and 4 (cf. M1 and U1 in Example 4, respec-

tively) show an example of how these interfaces could beused in Java.

package com.walmartlabs.example;

import java.nio.charset.Charset;import java.util.regex.Matcher;import java.util.regex.Pattern;

import org.slf4j.Logger;import org.slf4j.LoggerFactory;

import com.kosmix.muppet.application.Config;import com.kosmix.muppet.application.binary.Mapper;import com.kosmix.muppet.application.binary.PerformerUtilities;

public class RetailerMapper implements Mapper {private final Logger logger =

LoggerFactory.getLogger(RetailerMapper.class);private final Charset charset = Charset.forName("UTF-8");

private final Pattern walmart =Pattern.compile("(?i)\\s*wal.*mart.*");

private final Pattern samsclub =Pattern.compile("(?i)\\s*sam.*s\\s*club\\s*");

private String name;

public RetailerMapper(Config config, String n) { name = n; }

@Overridepublic String getName() { return name; }

@Overridepublic void map(PerformerUtilities submitter,

String stream, byte[] key, byte[] event){

String checkin = new String(event, charset);String venue = getVenue(checkin);

String retailer = null;if (walmart.matcher(venue).matches()) {

retailer = "Walmart";} else if (samsclub.matcher(venue).matches()) {

retailer = "Sam’s Club";}

if (retailer != null) {try {

submitter.publish("S_2",retailer.getBytes(charset), event);

} catch(Exception e) {logger.error("Could not publish event: "+

e.toString());}

}}

private String getVenue(String checkin) {// actual checkin parsing would go herereturn "name of venue";

}}

Figure 3: An example Java-language Mapper

1824

Page 12: Muppet: MapReduceStyle Processing of Fast Data - …Muppet: MapReduce Style Processing of Fast Data Wang Lam 1, Lu Liu 1, STS Prasad 1, Anand Rajaraman 1, Zoheb Vacheri 1, AnHai Doan

package com.walmartlabs.example;

import java.nio.charset.Charset;

import org.slf4j.Logger;import org.slf4j.LoggerFactory;

import com.kosmix.muppet.application.Config;import com.kosmix.muppet.application.binary.PerformerUtilities;import com.kosmix.muppet.application.binary.Updater;

public class Counter implements Updater {private final Logger logger =

LoggerFactory.getLogger(Counter.class);private final Charset charset = Charset.forName("UTF-8");

private String name;

public Counter(Config config, String n) { name = n; }

@Overridepublic String getName() { return name; }

@Overridepublic void update(PerformerUtilities submitter,

String stream, byte[] key, byte[] event,byte[] slate)

{int count = 0;try {

if (slate != null)count =

Integer.parseInt(new String(slate, charset));} catch (NumberFormatException e) {

count = 0;}++count;byte[] updatedSlate =

Integer.toString(count).getBytes(charset);submitter.replaceSlate(updatedSlate);

}}

Figure 4: An example Java-language Updater

1825


Recommended